-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs of support new vl model #1332
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,131 @@ | |||
# lmdeploy.vl 新模型支持 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如何添加多模态视觉模型(VLM)
@@ -0,0 +1,131 @@ | |||
# lmdeploy.vl 新模型支持 | |||
|
|||
目前,有一批 VLM 模型采用如下图所示的架构。图片经过 Vison Encoder 得到图片特征,之后经过 Projection 映射到文本的特征空间。最后,将图片特征与文本特征拼接后送入 LLM 进行推理。这类 VLM 模型有一个特点,即拼接后的特征送入 LLM 推理时并不区分特征的类型,两种特征之间没有交互计算。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前,LMDeploy 支持类似 LLaVA 架构的多模态视觉模型。如下图所示,在这种架构中,图片经过 ...
|
||
对于此类架构的模型,使用 LMDeploy 可以很方便的添加新模型的支持。 | ||
|
||
## 模型支持 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我感觉 紧跟着 "模型支持" 的文字部分,可以融入到前文。这样,VisonModel, VLChatTemplateWrapper 可以作为 H2 标题。
|
||
> \[!NOTE\] | ||
> | ||
> 一般 VLM 模型有一个对应的不带图片输入的LLM 模型,如 Qwen-VL-Chat 和 Qwen-7B-Chat,请先确保这个 LLM 模型可以被 TurboMind 引擎推理,或者他的模型结构和 TurboMind 已支持的模型结构相同。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除“或者他的模型结构...结构相同。”
|
||
def build_model(self): | ||
# init an empty model | ||
with init_empty_weights(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_empty_weights()
源自哪里?
|
||
# move model to cpu and load weight | ||
model.to_empty(device='cpu') | ||
load_model_from_weight_files(model, self.model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_model_from_weight_files 是可以通用的吗
|
||
下面以 Qwen/Qwen-VL-Chat 模型为例,展示如何使用 LMDeploy 添加这类模型的支持。 | ||
|
||
### VisonModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实现 VisonModel
添加新的视觉模型,主要需要修改两个地方: | ||
|
||
1. 抽取 VLM 模型对应的 vision 模型,并实现 `forward` 特征抽取函数 | ||
2. 修改 `load_vl_model` 函数, 使 VLM 模型在加载时可以找到对应的 VisionModel 模型 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下面的例子并没有 load_vl_model 函数
Motivation
add docs of support-new-vl-model