Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configs in inference.py necessary for context length expansion in model serving? #157

Open
spring1915 opened this issue Dec 13, 2023 · 0 comments

Comments

@spring1915
Copy link

In inference.py, there're two settings

    orig_ctx_len = getattr(config, "max_position_embeddings", None)
    if orig_ctx_len and args.context_size > orig_ctx_len:
        scaling_factor = float(math.ceil(args.context_size / orig_ctx_len))
        config.rope_scaling = {"type": "linear", "factor": scaling_factor}

and
model.resize_token_embeddings(32001)
Are they needed for the fine-tuned model with extended context length to work properly? For example, I finetuned the orignal Llama2 model to get a new context length of 16k, do I still need the settings for the model during inference? This is important since it will save us the hassle of writing custom inference code when using certain model-serving frameworks. We just tell the framework the model's save location.

@spring1915 spring1915 changed the title Configs in inference.py necessary for context length expansion? Configs in inference.py necessary for context length expansion in model serving? Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant