Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/cheick/speculative sampling #424

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Bouscout
Copy link

@Bouscout Bouscout commented Dec 21, 2023

Introduces a new file, generation_algorithm.py, housing the implementation of a speculative sampling algorithm. The algorithm has been integrated into the BaseOnsiteLLM class through the addition of a speculative_sampling attribute.

The speculative sampling algorithm receives essential parameters through generation_kw_args during the initialization of the BaseOnsiteLLM class. These parameters include the draft_model_uri, along with two optional hyperparameters, k and scheduler, which influence the number of tokens generated per iteration.

The algorithm's functionality is accessed through the complete method within the BaseOnsiteLLM class when the speculative_sampling attribute is present. It returns the newly generated token IDs. Additionally, the method takes an optional parameter, "alignment," which determines the degree of similarity between the probabilities of the draft tokens and those of the target tokens.

In scenarios where alignment is set to 1 (perfect alignment, the default value), the algorithm aims to predict the same exact answers as the target model would. The implementation is designed to handle a batch size of 1, aligning with the current handling of the generate method in the BaseOnsiteLLM class.

fixes #367

…ithm parameters and implemented the generation in the complete method when conditions are met
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mixed-model speculative sampling
1 participant