High RAM usage while loading Llama3 #817

FrostyMisa · 2024-05-01T17:17:38Z

Koboldccp 1.64, Hardware: Steamdeck

I was using Koboldcpp on Steamdeck LCD before with Vulkan and it works fast and great. After Llama3 I download latest version Koboldccp and even I use smaller Llama3 model, it looks like it takes 2 times more system RAM then for example Mistral model.

I will provide both logs if you can find something. I can load Mistral 7B Q6 model without problem. But I have problem run Llama3 8B Q3_K_M. In spike it takes around 15GB.

So it's loading Llama3 models different or Vulkan is not optimized for Llama3 now? Or something else?

https://wormhole.app/KvKBP#HPyt1TRWVKTwYT_uvQVKkw

LostRuins · 2024-05-02T03:11:08Z

Actually according to your logs about, the mistral model is using more memory than the llama 3 model. 15GB seems a bit much for a q3 8b.

FrostyMisa · 2024-05-02T09:02:25Z

According to the log yes, but you can see system monitor here in photos, Llama3 (the smaller one) spike to swap files. OpenBlas doesn't have this problem and the RAM usage is normal to the model size here. So it must be something with Vulkan.

FrostyMisa · 2024-05-03T07:59:30Z

I found someone write about Vulkan on Reddit and I tried version 1.61.2 and here it works like expected without the spikes.

Someone in the thread mention this: "We know 1.61 is the last version Vulkan works correct on, its because of a regression in Vulkan upstream that Occam didn't have time to submit his fixes for yet since its tied to MoE support. Will eventually be fixed, for now it's better to keep using 1.61 until you notice that we support MoE for Vulkan."

But it's definitely something between this version and 1.63 and 1.64. Those two I test and have problem loading Llama3 model.

Here is photo of RAM usage in 1.61. As you see, normal RAM usage, no spikes like in my photos I provide before in the last version.

henk717 · 2024-05-11T01:25:22Z

That reddit comment was by me. 1.65 will have the incoherency issues I was referencing fixed but the llama3 memory error we only discovered recently so that will remain a thing until occam finds a solution for that one seperately.

FrostyMisa · 2024-05-11T08:08:57Z

I can confirm even 1.65 have this RAM spikes problem with Llama3 model with Vulkan. So I will wait if someone fix it and I will report if it works again.
Thanks guys for your hard work making Koboldcpp great!

MadLightTheDoggo · 2024-05-24T20:23:38Z

Yeah, i'm having exactly the same problem with any version above 1.61.2
In fact, on that one i can launch Mixtral 8x7 Q4_K_M with 8192 context with my 32gb memory easily, but on anything higher it fills all the memory and begins to spill out into the disk. Because of it i can't tell how much more memory it eats, but if the disk space is any indicator, i would say at least 10 gigs more.
I thought that maybe it was fixed in recent builds, but nope.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High RAM usage while loading Llama3 #817

High RAM usage while loading Llama3 #817

FrostyMisa commented May 1, 2024 •

edited

LostRuins commented May 2, 2024

FrostyMisa commented May 2, 2024 •

edited

FrostyMisa commented May 3, 2024 •

edited

henk717 commented May 11, 2024

FrostyMisa commented May 11, 2024

MadLightTheDoggo commented May 24, 2024 •

edited

High RAM usage while loading Llama3 #817

High RAM usage while loading Llama3 #817

Comments

FrostyMisa commented May 1, 2024 • edited

LostRuins commented May 2, 2024

FrostyMisa commented May 2, 2024 • edited

FrostyMisa commented May 3, 2024 • edited

henk717 commented May 11, 2024

FrostyMisa commented May 11, 2024

MadLightTheDoggo commented May 24, 2024 • edited

FrostyMisa commented May 1, 2024 •

edited

FrostyMisa commented May 2, 2024 •

edited

FrostyMisa commented May 3, 2024 •

edited

MadLightTheDoggo commented May 24, 2024 •

edited