Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior when missing quantization version #447

Open
Reichenbachian opened this issue Dec 20, 2023 · 1 comment
Open

Behavior when missing quantization version #447

Reichenbachian opened this issue Dec 20, 2023 · 1 comment

Comments

@Reichenbachian
Copy link

The problem happened below. Turns out it didn't include the "general.quantization_version" metadata. In the case that llama.cpp reads a file without a version, it assumes 2 (grep for the line gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);), so this model works with llama.cpp but fails with rusformers/llm.

model_name = "meta-llama/Llama-2-7b-chat-hf"
    model = AutoModelForCausalLM.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.save_pretrained(local_dir)
    torch.save(model.state_dict(), os.path.join(local_dir, "pytorch_model.bin"))
python llm/crates/ggml/sys/llama-cpp/convert.py models/ --vocab-dir models/ --ctx 4096 --outtype q8_0
let model = llm::load(
                path,
                llm::TokenizerSource::Embedded,
                parameters,
                llm::load_progress_callback_stdout,
            )
            .unwrap_or_else(|err| panic!("Failed to load model: {err}"));

thread '<unnamed>' panicked at llm/inference/src/llms/local/llama2.rs:45:35:
Failed to load model: quantization version was missing, despite model containing quantized tensors

My solution was to just get rid of this whole block

    let any_quantized = gguf
        .tensor_infos
        .values()
        .any(|t| t.element_type.is_quantized());
    // if any_quantized {
    //     match quantization_version {
    //         Some(MetadataValue::UInt32(2)) => {
    //             // Currently supported version
    //         }
    //         Some(quantization_version) => {
    //             return Err(LoadError::UnsupportedQuantizationVersion {
    //                 quantization_version: quantization_version.clone(),
    //             })
    //         }
    //         None => return Err(LoadError::MissingQuantizationVersion),
    //     }
    // }

Unsure how you want to handle this since it does remove a check.

@Reichenbachian
Copy link
Author

a670aae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant