Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Medusa Speculative Decoding #423

Open
someone13574 opened this issue Sep 11, 2023 · 1 comment
Open

Medusa Speculative Decoding #423

someone13574 opened this issue Sep 11, 2023 · 1 comment

Comments

@someone13574
Copy link

Recently there was a project called Medusa which was released. It basically trains more lm_head's that instead of predicting the next token, they predict the token n+2, n+3, and n+4 before generating a tree of possible combinations of top-k possibilities for the upcoming tokens and evaluating them all at once with some clever masking and selecting one of the best ones. They get ~2x speedup and it looks like they are planning to integrate into llama.cpp, so I thought it would be a good fit for this project as well.

Links: Blog, Implementation, Models

@someone13574
Copy link
Author

Ref to llama.cpp issue ggerganov/llama.cpp#3137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant