Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deploy trained llava model? #586

Open
zodiacg opened this issue Apr 19, 2024 · 12 comments
Open

How to deploy trained llava model? #586

zodiacg opened this issue Apr 19, 2024 · 12 comments

Comments

@zodiacg
Copy link

zodiacg commented Apr 19, 2024

Currently the trained llava model can only be used by CLI (without the ability to use new images) or tested using benchmark tools.
How can we deploy it using API or WebUI as a more user-friendly interface?

@LZHgrla
Copy link
Collaborator

LZHgrla commented Apr 24, 2024

@zodiacg
lmdeploy v0.4.0 has supported the deployment of llava-llama-3-8b models. You can try it in https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#chat-by-lmdeploy

At the same time, we will provide a script ASAP to convert the xtuner trained model (such as llava-internlm2 models) to the llava official format model.

@zodiacg
Copy link
Author

zodiacg commented Apr 24, 2024

It would be very helpful since we have trained some llava models. We hope we can test them in an interactive way.

@flotos
Copy link

flotos commented Apr 26, 2024

From your replies do I understand correctly that the merge doesn't add the llava features to the model ?

Here are the steps I followed:
(from this and root readme) https://github.com/InternLM/xtuner/blob/main/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336/README.md#model-convert-and-merge

I tried to convert my finetuned result to HF using above guide, then merged it like this to existing xtuner llava:

xtuner convert merge \
    "xtuner/llava-llama-3-8b-v1_1" \
    "mytrainedmodel/visual_encoder_adapter" \
    ${SAVE_PATH} \
    --max-shard-size 2GB

However writting this I suppose the second parameter is a LLM Qlora and unrelated to the Llava adapter probably ?

@LZHgrla
Copy link
Collaborator

LZHgrla commented Apr 26, 2024

@zodiacg
@flotos
Please follow this new docs https://github.com/LZHgrla/xtuner/tree/lzh/llama3_convert/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336.
It introduces the commands for model conversion and chat.

We also release the related LLaVA-Llama-3-8B models, which can be found on above docs.

@flotos
Copy link

flotos commented May 7, 2024

Hi, thanks for your reply. I have tried to follow the steps, but my folders does not match the ones from the examples, using the Qlora finetune config. Indeed, in my pth to LLava in Xtuner format, I have two folders, llm_adapter, and projector, as well as a xtuner_config.py. No other files, as shown in the README with "visual_encoder_adapter".

Thus, when trying to convert to HF, I did
python ./convert_to_hf.py --text_model_id ./output/merged_mymodel/ --vision_model_id ./output/merged_mymodel/ --projector_weight ./output/merged_mymodel/projector/model.safetensors --save_path ./output/merged_mymodel_hf

Which did not work, with the following error : OSError: ./output/merged_mymodel/ does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/./output/merged_mymodel//tree/main' for available files.

I haven't done again the training since my comment two weeks ago, maybe there was an update to the library also which should now include the folder ?

Also, when trying to replace the --vision_model_id by openai/clip-vit-large-patch14-336 I get AttributeError: 'CLIPConfig' object has no attribute 'hidden_size'

@zodiacg
Copy link
Author

zodiacg commented May 8, 2024

@zodiacg @flotos Please follow this new docs https://github.com/LZHgrla/xtuner/tree/lzh/llama3_convert/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336. It introduces the commands for model conversion and chat.

We also release the related LLaVA-Llama-3-8B models, which can be found on above docs.

The scripts introduced are specifically tailored for LLaMA as the LLM. The primary appeal of xtuner, at least from my perspective, is the flexibility it offers to use other LLMs as the base. I hope that the xtuner-llava structure will also be supported.

@pppppM
Copy link
Collaborator

pppppM commented May 8, 2024

@zodiacg Yes, we are developing this feature in other PRs;
No longer need cumbersome model conversion, and can directly connect xtuner-llava to the inference backend.

@LZHgrla
Copy link
Collaborator

LZHgrla commented May 8, 2024

Hi, thanks for your reply. I have tried to follow the steps, but my folders does not match the ones from the examples, using the Qlora finetune config. Indeed, in my pth to LLava in Xtuner format, I have two folders, llm_adapter, and projector, as well as a xtuner_config.py. No other files, as shown in the README with "visual_encoder_adapter".

Thus, when trying to convert to HF, I did

python ./convert_to_hf.py --text_model_id ./output/merged_mymodel/ --vision_model_id ./output/merged_mymodel/ --projector_weight ./output/merged_mymodel/projector/model.safetensors --save_path ./output/merged_mymodel_hf

Which did not work, with the following error : OSError: ./output/merged_mymodel/ does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/./output/merged_mymodel//tree/main' for available files.

I haven't done again the training since my comment two weeks ago, maybe there was an update to the library also which should now include the folder ?

Also, when trying to replace the --vision_model_id by openai/clip-vit-large-patch14-336 I get AttributeError: 'CLIPConfig' object has no attribute 'hidden_size'

Hi! @zodiacg
You should first merge your llm lora to base llm by

xtuner merge $LLM $LORA_ADAPTER $SAVE_PATH

Then, please use the above saved llm as the value of --text_model_id.

For the value of --vision_model_id, since the config you used freezes all parameters of vit, we can directly use openai/clip-vit-large-patch14-336, and the error can be solved by #661

@flotos
Copy link

flotos commented May 8, 2024

Thanks, this worked well for me.
I have a question however, the script read

    freeze_llm=True,
    freeze_visual_encoder=True,

Why, if the llm is frozen, do I need to merge a qlora to the base LLM ? Shouldn't it train only the projection layer here ?
Lastly, should the steps above work if I simply change freeze_visual_encoder to false in the provided gpu1 script (and I do as the readme to merge/convert) ?

Thanks for the help above and your reactivity in previous questions 🙏

@LZHgrla
Copy link
Collaborator

LZHgrla commented May 8, 2024

@flotos
freeze_llm setting only freezes the base llm, and doesn't freeze the lora weights. So, in default setting, we should merge the lora into the base llm after training.

As for the freeze_visual_encoder, if you set it to False, we can get a visual_encoder in exported folder (since it is trained), and we should use this vit to build the llava model.

@LZHgrla
Copy link
Collaborator

LZHgrla commented May 8, 2024

@flotos
Overall, --text_model_id should be the llm for llava model and --vision_model_id should be the clip-vit for llava model.

So, do not forget to merge your lora.

@flotos
Copy link

flotos commented May 8, 2024

Thanks very much for your time, this is very clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants