How to deploy trained llava model? #586

zodiacg · 2024-04-19T05:10:06Z

Currently the trained llava model can only be used by CLI (without the ability to use new images) or tested using benchmark tools.
How can we deploy it using API or WebUI as a more user-friendly interface?

LZHgrla · 2024-04-24T05:26:14Z

@zodiacg
lmdeploy v0.4.0 has supported the deployment of llava-llama-3-8b models. You can try it in https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#chat-by-lmdeploy

At the same time, we will provide a script ASAP to convert the xtuner trained model (such as llava-internlm2 models) to the llava official format model.

zodiacg · 2024-04-24T08:35:08Z

It would be very helpful since we have trained some llava models. We hope we can test them in an interactive way.

flotos · 2024-04-26T10:03:55Z

From your replies do I understand correctly that the merge doesn't add the llava features to the model ?

Here are the steps I followed:
(from this and root readme) https://github.com/InternLM/xtuner/blob/main/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336/README.md#model-convert-and-merge

I tried to convert my finetuned result to HF using above guide, then merged it like this to existing xtuner llava:

xtuner convert merge \
    "xtuner/llava-llama-3-8b-v1_1" \
    "mytrainedmodel/visual_encoder_adapter" \
    ${SAVE_PATH} \
    --max-shard-size 2GB

However writting this I suppose the second parameter is a LLM Qlora and unrelated to the Llava adapter probably ?

LZHgrla · 2024-04-26T10:08:36Z

@zodiacg
@flotos
Please follow this new docs https://github.com/LZHgrla/xtuner/tree/lzh/llama3_convert/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336.
It introduces the commands for model conversion and chat.

We also release the related LLaVA-Llama-3-8B models, which can be found on above docs.

flotos · 2024-05-07T18:07:12Z

Hi, thanks for your reply. I have tried to follow the steps, but my folders does not match the ones from the examples, using the Qlora finetune config. Indeed, in my pth to LLava in Xtuner format, I have two folders, llm_adapter, and projector, as well as a xtuner_config.py. No other files, as shown in the README with "visual_encoder_adapter".

Thus, when trying to convert to HF, I did
python ./convert_to_hf.py --text_model_id ./output/merged_mymodel/ --vision_model_id ./output/merged_mymodel/ --projector_weight ./output/merged_mymodel/projector/model.safetensors --save_path ./output/merged_mymodel_hf

Which did not work, with the following error : OSError: ./output/merged_mymodel/ does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/./output/merged_mymodel//tree/main' for available files.

I haven't done again the training since my comment two weeks ago, maybe there was an update to the library also which should now include the folder ?

Also, when trying to replace the --vision_model_id by openai/clip-vit-large-patch14-336 I get AttributeError: 'CLIPConfig' object has no attribute 'hidden_size'

zodiacg · 2024-05-08T02:07:05Z

@zodiacg @flotos Please follow this new docs https://github.com/LZHgrla/xtuner/tree/lzh/llama3_convert/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336. It introduces the commands for model conversion and chat.

We also release the related LLaVA-Llama-3-8B models, which can be found on above docs.

The scripts introduced are specifically tailored for LLaMA as the LLM. The primary appeal of xtuner, at least from my perspective, is the flexibility it offers to use other LLMs as the base. I hope that the xtuner-llava structure will also be supported.

pppppM · 2024-05-08T04:33:34Z

@zodiacg Yes, we are developing this feature in other PRs;
No longer need cumbersome model conversion, and can directly connect xtuner-llava to the inference backend.

LZHgrla · 2024-05-08T06:57:34Z

Hi, thanks for your reply. I have tried to follow the steps, but my folders does not match the ones from the examples, using the Qlora finetune config. Indeed, in my pth to LLava in Xtuner format, I have two folders, llm_adapter, and projector, as well as a xtuner_config.py. No other files, as shown in the README with "visual_encoder_adapter".

Thus, when trying to convert to HF, I did

python ./convert_to_hf.py --text_model_id ./output/merged_mymodel/ --vision_model_id ./output/merged_mymodel/ --projector_weight ./output/merged_mymodel/projector/model.safetensors --save_path ./output/merged_mymodel_hf

Which did not work, with the following error : OSError: ./output/merged_mymodel/ does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/./output/merged_mymodel//tree/main' for available files.

I haven't done again the training since my comment two weeks ago, maybe there was an update to the library also which should now include the folder ?

Also, when trying to replace the --vision_model_id by openai/clip-vit-large-patch14-336 I get AttributeError: 'CLIPConfig' object has no attribute 'hidden_size'

Hi！ @zodiacg
You should first merge your llm lora to base llm by

xtuner merge $LLM $LORA_ADAPTER $SAVE_PATH

Then, please use the above saved llm as the value of --text_model_id.

For the value of --vision_model_id, since the config you used freezes all parameters of vit, we can directly use openai/clip-vit-large-patch14-336, and the error can be solved by #661

flotos · 2024-05-08T09:06:08Z

Thanks, this worked well for me.
I have a question however, the script read

    freeze_llm=True,
    freeze_visual_encoder=True,

Why, if the llm is frozen, do I need to merge a qlora to the base LLM ? Shouldn't it train only the projection layer here ?
Lastly, should the steps above work if I simply change freeze_visual_encoder to false in the provided gpu1 script (and I do as the readme to merge/convert) ?

Thanks for the help above and your reactivity in previous questions 🙏

LZHgrla · 2024-05-08T09:11:14Z

@flotos
freeze_llm setting only freezes the base llm, and doesn't freeze the lora weights. So, in default setting, we should merge the lora into the base llm after training.

As for the freeze_visual_encoder, if you set it to False, we can get a visual_encoder in exported folder (since it is trained), and we should use this vit to build the llava model.

LZHgrla · 2024-05-08T09:12:10Z

@flotos
Overall, --text_model_id should be the llm for llava model and --vision_model_id should be the clip-vit for llava model.

So, do not forget to merge your lora.

flotos · 2024-05-08T09:18:15Z

Thanks very much for your time, this is very clear.

pppppM added the feature request label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deploy trained llava model? #586

How to deploy trained llava model? #586

zodiacg commented Apr 19, 2024

LZHgrla commented Apr 24, 2024 •

edited

zodiacg commented Apr 24, 2024

flotos commented Apr 26, 2024

LZHgrla commented Apr 26, 2024

flotos commented May 7, 2024 •

edited

zodiacg commented May 8, 2024

pppppM commented May 8, 2024

LZHgrla commented May 8, 2024 •

edited

flotos commented May 8, 2024

LZHgrla commented May 8, 2024

LZHgrla commented May 8, 2024 •

edited

flotos commented May 8, 2024

How to deploy trained llava model? #586

How to deploy trained llava model? #586

Comments

zodiacg commented Apr 19, 2024

LZHgrla commented Apr 24, 2024 • edited

zodiacg commented Apr 24, 2024

flotos commented Apr 26, 2024

LZHgrla commented Apr 26, 2024

flotos commented May 7, 2024 • edited

zodiacg commented May 8, 2024

pppppM commented May 8, 2024

LZHgrla commented May 8, 2024 • edited

flotos commented May 8, 2024

LZHgrla commented May 8, 2024

LZHgrla commented May 8, 2024 • edited

flotos commented May 8, 2024

LZHgrla commented Apr 24, 2024 •

edited

flotos commented May 7, 2024 •

edited

LZHgrla commented May 8, 2024 •

edited

LZHgrla commented May 8, 2024 •

edited