Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about llama_flash_attn_monkey_patch #35

Closed
mmmwhy opened this issue Apr 25, 2024 · 1 comment
Closed

Question about llama_flash_attn_monkey_patch #35

mmmwhy opened this issue Apr 25, 2024 · 1 comment

Comments

@mmmwhy
Copy link

mmmwhy commented Apr 25, 2024

https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/train/llama_flash_attn_monkey_patch.py is different with https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_flash_attn_monkey_patch.py

for example:

https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/train/llama_flash_attn_monkey_patch.py
image
and

https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_flash_attn_monkey_patch.py
image

it seems chat-univi change some code in llama_flash_attn_monkey_patch, can you help explain the reason for modifying the code? ♥️

@jpthu17
Copy link
Member

jpthu17 commented Apr 28, 2024

We use standard multi-head attention. Since LLaMA 3 uses grouped-query attention, we guess that LLaVA made changes following LLaMA 3. (The main purpose of grouped-query attention is to reduce KV cache.)
image

@jpthu17 jpthu17 closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants