-
I'm running out of memory running commandR llm_load_print_meta: model type = 35B Why is device 0 running out of memory? That's about 22gb of ram, the device has 24. If I load less on it, then it fails for device 1, etc. How much does kv actually use? How do I calculate the usage given model size and context size? I want to know if i'm doing something wrong or if it's broken. Thanks I'm running |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I'm running out of memory. After lots of expriments, I observed that besides the model using up ram, we need space for KV storage and especially plenty for compute buffers. commandR+ seems to also be on the high side with it demands. Looks like I would need 200+gb of vram to be able to get 128k context. |
Beta Was this translation helpful? Give feedback.
I'm running out of memory. After lots of expriments, I observed that besides the model using up ram, we need space for KV storage and especially plenty for compute buffers. commandR+ seems to also be on the high side with it demands. Looks like I would need 200+gb of vram to be able to get 128k context.