We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Medusa1 model generalize token-wise the same as the base model w.o. medusa head?
I found change medusa choices will change the output.
The text was updated successfully, but these errors were encountered:
We've figured out this problem by shrinking the medusa choices to only top-1 predictions, i.e., [(0), (0,0), (0,0,0), (0,0,0,0), (0,0,0,0,0)].
In such way, MHCA computation will get a bit-wise the same logits as the baseline wo medusa decoding.
Hope it helps for other people interested in bitwise the same decoding.
Sorry, something went wrong.
We've figured out this problem by shrinking the medusa choices to only top-1 predictions, i.e., [(0), (0,0), (0,0,0), (0,0,0,0), (0,0,0,0,0)]. In such way, MHCA computation will get a bit-wise the same logits as the baseline wo medusa decoding. Hope it helps for other people interested in bitwise the same decoding.
No branches or pull requests
Is Medusa1 model generalize token-wise the same as the base model w.o. medusa head?
I found change medusa choices will change the output.
The text was updated successfully, but these errors were encountered: