Question about llama_flash_attn_monkey_patch #35

mmmwhy · 2024-04-25T02:43:35Z

https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/train/llama_flash_attn_monkey_patch.py is different with https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_flash_attn_monkey_patch.py

for example:

https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/train/llama_flash_attn_monkey_patch.py

and

https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_flash_attn_monkey_patch.py

it seems chat-univi change some code in llama_flash_attn_monkey_patch, can you help explain the reason for modifying the code? ♥️

jpthu17 · 2024-04-28T14:06:50Z

We use standard multi-head attention. Since LLaMA 3 uses grouped-query attention, we guess that LLaVA made changes following LLaMA 3. (The main purpose of grouped-query attention is to reduce KV cache.)

jpthu17 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about llama_flash_attn_monkey_patch #35

Question about llama_flash_attn_monkey_patch #35

mmmwhy commented Apr 25, 2024 •

edited

jpthu17 commented Apr 28, 2024

Question about llama_flash_attn_monkey_patch #35

Question about llama_flash_attn_monkey_patch #35

Comments

mmmwhy commented Apr 25, 2024 • edited

jpthu17 commented Apr 28, 2024

mmmwhy commented Apr 25, 2024 •

edited