-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High RAM usage while loading Llama3 #817
Comments
Actually according to your logs about, the mistral model is using more memory than the llama 3 model. 15GB seems a bit much for a q3 8b. |
That reddit comment was by me. 1.65 will have the incoherency issues I was referencing fixed but the llama3 memory error we only discovered recently so that will remain a thing until occam finds a solution for that one seperately. |
I can confirm even 1.65 have this RAM spikes problem with Llama3 model with Vulkan. So I will wait if someone fix it and I will report if it works again. |
Yeah, i'm having exactly the same problem with any version above 1.61.2 |
Koboldccp 1.64, Hardware: Steamdeck
I was using Koboldcpp on Steamdeck LCD before with Vulkan and it works fast and great. After Llama3 I download latest version Koboldccp and even I use smaller Llama3 model, it looks like it takes 2 times more system RAM then for example Mistral model.
I will provide both logs if you can find something. I can load Mistral 7B Q6 model without problem. But I have problem run Llama3 8B Q3_K_M. In spike it takes around 15GB.
So it's loading Llama3 models different or Vulkan is not optimized for Llama3 now? Or something else?
https://wormhole.app/KvKBP#HPyt1TRWVKTwYT_uvQVKkw
The text was updated successfully, but these errors were encountered: