You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…max_tokens (#457)
- **[Breaking]** Rename `max_gen_len` to `max_tokens` in
`ChatCompletionRequest`
- Remove reading `max_gen_len` from `mlc-chat-config.json`, hence remove
it from `ChatOptions`
- Remove reading `mean_gen_len` and `shift_fill_factor` from
`mlc-chat-config.json`, hence:
- Remove these two fields from `ChatOption`
- Throw error when the input prompt tokens exceed `context_window_size`,
prompting user to truncate input, or increase `context_window_size`, or
use sliding window
- Finish reason with `length` when decoding exceed `context_window_size`
(when `prompt_size + max_tokens > context_window_size`)
- Hence this error is never shown to user:
#385
This change is needed in order to be compatible with new
`mlc-chat-config.json` in hugginface due to
mlc-ai/mlc-llm#2493
Thanks for reporting the issue. This error should not be displayed anymore after #457, integrated into 0.2.42 and later.
If the request's length exceeds KVCache's context window size, we will throw error. If during decode, the number of generated tokens exceed the context window size, we end the response with finish reason "length".
when prompting to summarize long article or long chat
The text was updated successfully, but these errors were encountered: