-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU version build not using GPU #114
Comments
I am facing the same issue. Can anyone please guide us on this? |
I ended up using just llama.cpp. Works very well on the GPU. You can write a simple wrapper in nodejs without rust. I can share the code if you want. |
@deonis1 it will be a great help. Please share the code. |
@shaileshminsnapsys no problem the code is here https://github.com/deonis1/llcui |
Thank you @deonis1 , I'll check with the code. Thank you for your help. |
Let me know if you have any issues |
@deonis1 Thank you so much, your code help me alot to achieve my target. Many Thanks !! |
@shaileshminsnapsys no problem, there is a new version if you are interested |
@deonis1 would love to see the new version. Thank you |
@shaileshminsnapsys The new version that supports embedding (mongodb or text document) is released. You can find it under the new url: |
@deonis1 |
Hi Everyone,
I am trying to build llama-node for GPU, I followed the guide in the readme https://llama-node.vercel.app/docs/cuda but the version of the llam-cpp I get from a manual build uses CPU not GPU. When I build llama-cpp directly in llama-sys folder using the following command:
make clean && LLAMA_CUBLAS=1 make -j
It gives me perfectly fine GPU executable file which works no problem.
Am I missing something?
Here is my full build commands:
git clone https://github.com/Atome-FE/llama-node.git
cd llama-node/
rustup target add x86_64-unknown-linux-musl
git submodule update --init --recursive
pnpm install --ignore-scripts
cd packages/llama-cpp/
pnpm build:cuda
Then I get libllama.so file in my ~/.llama-node which when used does not use GPU: Here my script to run it:
import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";
const model = path.resolve(process.cwd(), "~/CODE/models/vicuna-7b-v1.3.ggmlv3.q4_0.bin");
const llama = new LLM(LLamaCpp);
const config = {
modelPath: model,
enableLogging: true,
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
nGpuLayers: 40
};
const template =
How do I train you to read my documents?
;const prompt =
A chat between a user and an assistant. USER: ${template} ASSISTANT:
;const params = {
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1,
prompt,
};
const run = async () => {
await llama.load(config);
await llama.createCompletion(params, (response) => {
process.stdout.write(response.token);
});
};
run();
Any help appreciated
The text was updated successfully, but these errors were encountered: