New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP Finetuned Model-optimizer and tokenizer #476
Comments
Hi @waterluck Q1: What looks a bit weird to me is that the __0_X.distcp files get bigger when you you store the optimizer as well. Will need to look into this to confirm this is right or an error. Hope that helps. |
Hi @mreso , thanks for the confirmation! also regrading the whole finetuning process, I noticed that when run several times with all the same parameter settings, the loss at each epoch differs big, I checked all the parameters is the same, and I didn't change the random_seed(which I think is fixed to 42), is this expected? or if there any other steps in the code can bring randomness. |
Some ops use non-deterministic algorithms so some fluctuation is expected. See https://pytorch.org/docs/stable/notes/randomness.html if you can disable non-deterministic behavior but beware that this will have an impact on your training performance. |
Great! Thanks for your answer, it helps a lot. |
Thanks for the tutorials! I have several small questions about the model ft and usage.
When doing Full parameter finetune using FSDP only,
Q1: should we use
save_optimizer
to True or not?I first set it to True, and I found the model goes to very large, I fine-tuned on 10K Pawsx data samples, got
__0_0.distcp
~__3_0.distcp
with each file 9.4GB large, and 2 extraoptimizer-xxx.pt
file likeoptimizer-llama-2-7b-0.pt
with 25GB each.And when I set it to false, I got 4x
__0_0.distcp
file from 0 to 4, with 3.14GB each.I'm unsure whether it's normal or not to be that large, and whether save_optimizer is necessary.
Q2: Is the
llama2-xB-hf
andllama2-xB-hf-chat
series model use the same tokenizer?There's no tokenizer.model file from fine-tuned model, and I noticed the size of these 2 model's tokenizer files looks the same in the official repository;
I want to know whether their tokenizer remains the same, especially the
tokenizer_model
in the model file.also, can we use
fast_tokenizer
in llama2?Q3: When SFT on llama on classification task, with a single target label, is there any influence if not train on input
, which sets the input to be
-100`.Thanks if you can take a look of these questions.
The text was updated successfully, but these errors were encountered: