Num Heads & Num Medusa Layers #44
santhosh97
started this conversation in
General
Replies: 1 comment
-
We report the train Accuracy of 5 Medusa heads for Vicuna. The efficiency in generation is affected by two elements: the acceleration ratio and overhead. Increasing the number of prediction paths on the Medusa tree can enhance the acceleration ratio on one side. Conversely, elongating the input length (comparing extended token k to 1) incurs notable overhead, especially when k exceeds 64, corresponding to the GPU warp of A100—this may vary with other devices. Therefore, you can refer to the Appendix of our blogs to view a plot displaying most of the optimal wall time configurations, which are centered around 64. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey All!
I had of questions regarding Medusa. Has your team tested / experimented / ablation studies with increasing the number of medusa heads / medusa layers? For instance, the default right now is 3 heads and 1 medusa layer, if we changed that to say 6 and 2, do you foresee an even more increase in generation efficiency (say 3x to 4x) with a drawback in accuracy / quality score?
Best,
Santhosh Subramanian
Beta Was this translation helpful? Give feedback.
All reactions