-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The absence or presence of a system token results in different outputs. #203
Comments
So, did META just change the model card page after my github issue, completely ignoring this issue? :) https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ |
Are you referring to a case where you pass the system header but no system_prompt, i.e.
Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string. If you don't have a system message it is better to not include the system header. This is how we encode dialogs Line 202 in cc44ca2
I don't think the changes to the model-card are related to this issue, but we'd appreciate your suggestions to improve its clarity :) cc @carljparker |
Thanks for your response. Yes, that's what I'm referring to.
It is indeed expected behavior, as the input becomes is different, the output would be different. However the question is which output is the expected one by the author of the model and the training process. As per my findings, If the model has been trained with system headers present (in my case fine tuned):
And later inferenced as per the tokenizer.py you referenced Conclusion:
1: Why would it not be included if it was trained with a system header? Wouldn't it be logical to assume that your outputs during training is the one we should expect during inference, and therefore keep the system headers as is regardless of an empty system message or not? 2: What makes you conclude that it is better to leave out the system message? We have 2 different outputs, how do we come to that conclusion that one output (without system headers) would be better than the other (with system headers)? In my tests, the opposite is true, especially during tuning and training, leaving out tokens that were present during training would break the expected output. I'm grateful for clarification and your response! :) In regards to the model card page, it is something only one can speculate and only the author of the page knows the reason for the changes, it is peculiar however that my quoted wordings were completely removed just a day after my issue here. But no clarification shined on this thread. But let's leave that aside and focus on the issue at hand. |
My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.
So i would not expect it to give good results. If you are getting better results with a null prompt, that's interesting - if you can share it, please DM me on twitter (same handle as github username). |
No no , you are correct, the better result is if it was trained with system headers and later inferenced with the system headers present too , regardless of null system message. The second question I mean and the question is for the official Meta instruct model: Should the system headers be present or not, regardless of null system prompt? |
Just leaving this in here https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/sample_finetune.py
Edit: |
Describe the bug
As per the official documentation:
https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/
It is stated:
However, in follow-up examples given in the documentation, system token is only present if the system message is present:
1: Single message example
2: System prompt message added to a single user message
However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.
This can be seen here in my findings:
ggerganov/llama.cpp#7062 (comment)
Fine tuning instruct model:
Fine tuning the instruct models with system token present, and then run inference without system tokens present, breaks the fine tuning.
Inference on original instruct model:
Since the outputs are different based on the presence of system tokens, the question arrives, is the output better or worse for the instruct models? Which method produces the expected output based on the instruct tuning that has been done internally by Meta?
The text was updated successfully, but these errors were encountered: