-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bge-reranker-v2-minicpm-layerwise微调loss为1的问题 #792
Comments
是一开始就为0吗,还是训练的过程中变为0的 |
一开始就为0,我将学习率调整为2e-7之后,loss开始变得很大,从几百多开始逐渐下降。我将学习率调整为2e-4同样出现以下情况。 |
训练的时候加上参数--finetune_type from_raw_model from_finetuned_model |
我添加了这个参数还是同样的问题,损失还是很大。 |
还是从500多开始下降吗 |
File "/media/ai/HDD/Teamwork/wangenzhi/FlagEmbedding-master/official/FlagEmbedding/FlagEmbedding/llm_reranker/finetune_for_layerwise/run.py", line 23, in main |
是2选一,from_finetuned_model 就可以,最后loss是所有层loss的累加,所以会显得比较大 |
你好,这个是我的从原始模型微调的loss变化: 这个是我从微调模型开始的loss变化: 两者能有10倍的差距,这个正常吗? |
是的 |
CUDA_VISIBLE_DEVICES=6,7 torchrun --nproc_per_node 2
-m FlagEmbedding.llm_reranker.finetune_for_layerwise.run
--output_dir ./results/reranker/bge-reranker-v2-minicpm-layerwise
--model_name_or_path /media/ai/HDD/Teamwork/LLM_Embedding_model/Embedding/Embedding/bge-reranker-v2-minicpm-layerwise
--train_data /media/ai/HDD/Teamwork/wangenzhi/FlagEmbedding-master/official/FlagEmbedding/fine_data/layer_reranker.jsonl
--learning_rate 6e-5
--fp16
--num_train_epochs 1
--per_device_train_batch_size 2
--gradient_accumulation_steps 4
--dataloader_drop_last True
--query_max_len 64
--passage_max_len 256
--train_group_size 2
--logging_steps 10
--save_steps 10
--save_total_limit 10
--warmup_ratio 0.1
--use_lora True
--lora_rank 32
--lora_alpha 64
--use_flash_attn False
--target_modules q_proj k_proj v_proj o_proj
--start_layer 8
--head_multi True
--head_type simple
--lora_extra_parameters linear_head
When using the above command for fine-tuning, if the loss value becomes 'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.0, '
, you can try the following methods to resolve the issue:
The text was updated successfully, but these errors were encountered: