Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs of support new vl model #1332

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Mar 22, 2024

Motivation

add docs of support-new-vl-model

@@ -0,0 +1,131 @@
# lmdeploy.vl 新模型支持
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如何添加多模态视觉模型(VLM)

@@ -0,0 +1,131 @@
# lmdeploy.vl 新模型支持

目前,有一批 VLM 模型采用如下图所示的架构。图片经过 Vison Encoder 得到图片特征,之后经过 Projection 映射到文本的特征空间。最后,将图片特征与文本特征拼接后送入 LLM 进行推理。这类 VLM 模型有一个特点,即拼接后的特征送入 LLM 推理时并不区分特征的类型,两种特征之间没有交互计算。
Copy link
Collaborator

@lvhan028 lvhan028 Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前,LMDeploy 支持类似 LLaVA 架构的多模态视觉模型。如下图所示,在这种架构中,图片经过 ...


对于此类架构的模型,使用 LMDeploy 可以很方便的添加新模型的支持。

## 模型支持
Copy link
Collaborator

@lvhan028 lvhan028 Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我感觉 紧跟着 "模型支持" 的文字部分,可以融入到前文。这样,VisonModel, VLChatTemplateWrapper 可以作为 H2 标题。


> \[!NOTE\]
>
> 一般 VLM 模型有一个对应的不带图片输入的LLM 模型,如 Qwen-VL-Chat 和 Qwen-7B-Chat,请先确保这个 LLM 模型可以被 TurboMind 引擎推理,或者他的模型结构和 TurboMind 已支持的模型结构相同。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除“或者他的模型结构...结构相同。”


def build_model(self):
# init an empty model
with init_empty_weights():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_empty_weights() 源自哪里?


# move model to cpu and load weight
model.to_empty(device='cpu')
load_model_from_weight_files(model, self.model_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_model_from_weight_files 是可以通用的吗


下面以 Qwen/Qwen-VL-Chat 模型为例,展示如何使用 LMDeploy 添加这类模型的支持。

### VisonModel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实现 VisonModel

添加新的视觉模型,主要需要修改两个地方:

1. 抽取 VLM 模型对应的 vision 模型,并实现 `forward` 特征抽取函数
2. 修改 `load_vl_model` 函数, 使 VLM 模型在加载时可以找到对应的 VisionModel 模型
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面的例子并没有 load_vl_model 函数

@lvhan028 lvhan028 added the documentation Improvements or additions to documentation label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants