Check failed: (!free_page_ids_.empty()) is false: The KV cache is full. No page can be allocated. #385

pilosof · 2024-04-29T08:24:49Z

when prompting to summarize long article or long chat

system: macbook pro m1
model: llama3

…max_tokens (#457) - **[Breaking]** Rename `max_gen_len` to `max_tokens` in `ChatCompletionRequest` - Remove reading `max_gen_len` from `mlc-chat-config.json`, hence remove it from `ChatOptions` - Remove reading `mean_gen_len` and `shift_fill_factor` from `mlc-chat-config.json`, hence: - Remove these two fields from `ChatOption` - Throw error when the input prompt tokens exceed `context_window_size`, prompting user to truncate input, or increase `context_window_size`, or use sliding window - Finish reason with `length` when decoding exceed `context_window_size` (when `prompt_size + max_tokens > context_window_size`) - Hence this error is never shown to user: #385 This change is needed in order to be compatible with new `mlc-chat-config.json` in hugginface due to mlc-ai/mlc-llm#2493

CharlieFRuan · 2024-06-06T20:09:37Z

Thanks for reporting the issue. This error should not be displayed anymore after #457, integrated into 0.2.42 and later.

If the request's length exceeds KVCache's context window size, we will throw error. If during decode, the number of generated tokens exceed the context window size, we end the response with finish reason "length".

To fix, try one of the three options:

Reduce prompt length
Increase context window size (requires more RAM)
Use sliding window, set either in chatOptions or overrides, for more see [KVCache] Add KVCache settings to ChatOptions, add overrides to ModelRecord #455

CharlieFRuan mentioned this issue Jun 4, 2024

[OpenAI] Handle exceeding context window size, rename max_gen_len to max_tokens #457

Merged

tqchen closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check failed: (!free_page_ids_.empty()) is false: The KV cache is full. No page can be allocated. #385

Check failed: (!free_page_ids_.empty()) is false: The KV cache is full. No page can be allocated. #385

pilosof commented Apr 29, 2024 •

edited

CharlieFRuan commented Jun 6, 2024

Check failed: (!free_page_ids_.empty()) is false: The KV cache is full. No page can be allocated. #385

Check failed: (!free_page_ids_.empty()) is false: The KV cache is full. No page can be allocated. #385

Comments

pilosof commented Apr 29, 2024 • edited

CharlieFRuan commented Jun 6, 2024

pilosof commented Apr 29, 2024 •

edited