Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage] About finetuning llama 2 with liuhaotian/llava-pretrain-llama-2-7b-chat #1504

Open
llv22 opened this issue May 15, 2024 · 2 comments

Comments

@llv22
Copy link

llv22 commented May 15, 2024

Describe the issue

Issue: I try to do visual instruction tuning using the pretrained projector liuhaotian/llava-pretrain-llama-2-7b-chat. However, got the following issue. I have download the projector from https://huggingface.co/liuhaotian/llava-pretrain-llama-2-7b-chat to ./checkpoints/llava-pretrain-llama-2-7b-chat. According to the guide in https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune.sh and https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md, I think I should use meta-llama/Llama-2-7b-chat-hf during fine-tuning. But I got an issue, please check the details in the logging section.

Command:

deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero3.json \
    --model_name_or_path meta-llama/Llama-2-7b-chat-hf \
    --version v1 \
    --data_path ./playground/data/llava_v1_5_mix665k.json \
    --image_folder ./playground/data \
    --vision_tower openai/clip-vit-large-patch14 \
    --pretrain_mm_mlp_adapter ./checkpoints/llava-pretrain-llama-2-7b-chat/mm_projector.bin \
    --mm_projector_type mlp2x_gelu \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --group_by_modality_length True \
    --bf16 True \
    --output_dir ./checkpoints/llava-llama2-7b-finetune \
    --num_train_epochs 1 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb

Log:

2024-05-15 11:48:42.708 ERROR train - global_exception_handler: Uncaught exception Error(s) in loading state_dict for Sequential:
	Missing key(s) in state_dict: "0.weight", "0.bias", "2.weight", "2.bias". 
	Unexpected key(s) in state_dict: "weight", "bias". 
NoneType: None
2024-05-15 11:48:42.708 ERROR train - global_exception_handler: <class 'RuntimeError'>
2024-05-15 11:48:42.708 ERROR train - global_exception_handler: <class 'RuntimeError'>
2024-05-15 11:48:42.709 ERROR train - global_exception_handler: 
	  File "/data/orlando/workspace/AndroidAgentModelZoo/models/LLaVA_forward/llava/train/train_mem.py", line 4, in <module>
    train(attn_implementation="flash_attention_2")
  File "/data/orlando/workspace/AndroidAgentModelZoo/models/LLaVA_forward/llava/train/train.py", line 1302, in train
    model.get_model().initialize_vision_modules(
  File "/data/orlando/workspace/AndroidAgentModelZoo/models/LLaVA_forward/llava/model/llava_arch.py", line 97, in initialize_vision_modules
    self.mm_projector.load_state_dict(get_w(mm_projector_weights, 'mm_projector'))
  File "/usr/local/anaconda3/envs/agentbackend/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

I guessed that may be caused by inconsistency between the model_name_or_path and the referenced model used in projector. However, in the projector's setting, I can only see the model name is ./checkpoints/llama_2/llama-2-7b-chat (https://huggingface.co/liuhaotian/llava-pretrain-llama-2-7b-chat/blob/main/config.json). Could you clarify what llama2 model should I use in --model_name_or_path?

PS: For my understanding, the pertaining phase focuses on language and image alignment (feature alignment) so its goal is to train an appropriate projector to map image into language space. Then with this projector, we can fine-tune both language and image to improve task performance. My guess is meta-llama/Llama-2-7b-chat-hf should be OK (it's the converted format from meta's official release llama2), or according to https://github.com/haotian-liu/LLaVA/blob/main/docs/LLaVA_from_LLaMA2.md, I need to download the latest llama2 checkpoints and use it (I try this, but failed, because this format can't be loaded by huggingface API).

Current follow-up:
Now I'm trying to use meta-llama/Llama-2-7b-chat-hf to pretrain a projector, then follow the fine-tune process.

Could you clarify which language model I should use for llava-pretrain-llama-2-7b-chat/mm_projector.bin? Correct me if there is anything wrong for my description.

Really appreciate your help

Orlando

@aybora
Copy link

aybora commented May 17, 2024

You need to change mm_projector_type to linear. mlp2x_gelu is for Vicuna.

@llv22
Copy link
Author

llv22 commented May 17, 2024

@aybora if I want to support llama2 with projector mlp2x_gelu, I need to traint the first phase and get my own projector?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants