计划支持多模态模型，比如llava1.5的long sequence的训练吗 #629

dyyoungg · 2024-04-30T05:15:40Z

如题，

hhaAndroid · 2024-04-30T05:16:22Z

llava 1.5 有支持 long sequence 训练吗？

dyyoungg · 2024-04-30T05:48:32Z

llava 1.5 有支持 long sequence 训练吗？

目前看起来没有

hhaAndroid · 2024-04-30T07:32:48Z

长序列训练不是问题，目前 xtuner 已经支持了。主要问题是需要多模态的长序列数据集

HIT-cwh · 2024-04-30T07:32:55Z

能不能问下你的长序列训练场景是什么呢？我看目前Llava训练的序列长度普遍不长

dyyoungg · 2024-04-30T08:36:11Z

能不能问下你的长序列训练场景是什么呢？我看目前Llava训练的序列长度普遍不长

目前很多视频理解模型都是基于llava的，但是理解长度都短，长视频的理解需要更多的图像token

dyyoungg · 2024-04-30T08:55:17Z

长序列训练不是问题，目前 xtuner 已经支持了。主要问题是需要多模态的长序列数据集

我其实困惑就在于多模态数据集处理的时候，是要过vision encoder和projector的，但是如果多图的话，比如几百上千张图，不可能等到你把llm的sequence都拼完了再来切吧，这样效率感觉就低了。就是有vision encoder之后感觉这套训练流程似乎得改

HIT-cwh · 2024-05-08T04:30:39Z

长序列训练不是问题，目前 xtuner 已经支持了。主要问题是需要多模态的长序列数据集

我其实困惑就在于多模态数据集处理的时候，是要过vision encoder和projector的，但是如果多图的话，比如几百上千张图，不可能等到你把llm的sequence都拼完了再来切吧，这样效率感觉就低了。就是有vision encoder之后感觉这套训练流程似乎得改

如果允许vision encoder的重复计算，现有xtuner的序列并行方法应该比较好支持，如果不允许sequence parallel group内的重复计算，可能就复杂多了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

计划支持多模态模型，比如llava1.5的long sequence的训练吗 #629

计划支持多模态模型，比如llava1.5的long sequence的训练吗 #629

dyyoungg commented Apr 30, 2024

hhaAndroid commented Apr 30, 2024

dyyoungg commented Apr 30, 2024

hhaAndroid commented Apr 30, 2024

HIT-cwh commented Apr 30, 2024

dyyoungg commented Apr 30, 2024

dyyoungg commented Apr 30, 2024

HIT-cwh commented May 8, 2024

计划支持多模态模型，比如llava1.5的long sequence的训练吗 #629

计划支持多模态模型，比如llava1.5的long sequence的训练吗 #629

Comments

dyyoungg commented Apr 30, 2024

hhaAndroid commented Apr 30, 2024

dyyoungg commented Apr 30, 2024

hhaAndroid commented Apr 30, 2024

HIT-cwh commented Apr 30, 2024

dyyoungg commented Apr 30, 2024

dyyoungg commented Apr 30, 2024

HIT-cwh commented May 8, 2024