Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 quantized q5_1 #108

Open
HolmesDomain opened this issue Jul 25, 2023 · 1 comment
Open

Llama2 quantized q5_1 #108

HolmesDomain opened this issue Jul 25, 2023 · 1 comment

Comments

@HolmesDomain
Copy link

HolmesDomain commented Jul 25, 2023

I am getting this error:

llama.cpp: loading model from /Documents/Proj/delta/llama-2-7b-chat/ggml-model-q5_1.bin
error loading model: unrecognized tensor type 14

llama_init_from_file: failed to load model
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Documents/Proj/delta/llama-2-7b-chat/ggml-model-q5_1.bin] {
  code: 'GenericFailure'
}

My index.js:

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

const model = path.resolve(process.cwd(), "./llama-2-7b-chat/ggml-model-q5_1.bin");
const llama = new LLM(LLamaCpp);
const config = {
    modelPath: model,
    enableLogging: false,
    nCtx: 1024,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
    useMmap: true,
    nGpuLayers: 0
};

const run = async () => {
    await llama.load(config);
  
    await llama.createCompletion({
        prompt: "My favorite movie is",
        nThreads: 4,
        nTokPredict: 1024,
        topK: 40,
        topP: 0.1,
        temp: 0.3,
        repeatPenalty: 1,
      }, (response) => {
        process.stdout.write(response.token)
      })
  }
  
  run();

It worked before I quantized, but I am hoping quantization makes it faster because it is so slow right now (I'm assuming this would have fixed the speed).

@HolmesDomain
Copy link
Author

HolmesDomain commented Jul 25, 2023

Got it running by using the .bin file from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main

Had no luck generating the q5_1 from here (via the instructions): https://github.com/ggerganov/llama.cpp#prepare-data--run

If this is a common problem maybe you can point people in the direction of just doing a direct download from TheBloke.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant