Skip to content

Fused Attention #724

Answered by DDEle
bearn01d asked this question in Q&A
Nov 20, 2023 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

Thank you for your interest of the Fused Attention optimization. We are planning on this. However, it will take some time as Fused Attention works only on activation which are more prone to quantization in terms of accuracy. In addition, we need to be more careful in terms of performance as there is no zero-cost weight quantization.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@bearn01d
Comment options

Answer selected by DDEle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants