New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: llama.cpp/GGUF CPU offloading no longer present? #2859
Comments
hi @mr-september,
cc: @cahyosubroto @aindrajaya @irfanpena to update 2 points mentioned above into our docs. |
@Van-QA Can the parameters in the https://jan.ai/docs/built-in/llama-cpp for |
hi @irfanpena, all 5 parameters here can be applied to model.json:
where
are more important (more impact) than the rest |
Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated. If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.) |
Linking the issue to #2208, related to RAM/VRAM utilization.
|
Pages
Success Criteria
~/jan/engines/nitro.json
at all. It only hasgroq.json
andopenai.json
.ngl: 100
line at all.Additional context
Is this feature still in jan.ai? I am trying to run models bigger than my GPU's VRAM limits.
The text was updated successfully, but these errors were encountered: