Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'. #396

Closed
137591 opened this issue May 14, 2024 · 2 comments

Comments

@137591
Copy link

137591 commented May 14, 2024

Why is the number of tokens in the prompt's output different from the actual number of tokens produced by the tokenizer?
I used the LLaMA 2 tokenizer, and the prompt 'hello' is only split into 2 tokens. However, the prefill count provided by the project is 36 tokens, and experiments have confirmed that all prefill outputs have 34 more tokens than the original prompt's token count. Please explain the reason.

@CharlieFRuan
Copy link
Contributor

CharlieFRuan commented May 14, 2024

They are due to the system prompt as shown in the mlc-chat-config.json: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases.

If you'd like not to use a system prompt, try overriding it with an empty string:

  const request: webllm.ChatCompletionRequest = {
    messages: [
      {"role": "system", "content": ""},
      { "role": "user", "content": "Hello" },
    ],
  };

@137591
Copy link
Author

137591 commented May 14, 2024

They are due to the system prompt as shown in the mlc-chat-config.json: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases.它们是由于 mlc-chat-config.json 中所示的系统提示造成的:https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc -chat-c​​onfig.json#L33-L34,遵循官方模型发布的规范。

If you'd like not to use a system prompt, try overriding it with an empty string:如果您不想使用系统提示符,请尝试使用空字符串覆盖它:

  const request: webllm.ChatCompletionRequest = {
    messages: [
      {"role": "system", "content": ""},
      { "role": "user", "content": "Hello" },
    ],
  };

got it!thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants