Is it possible to load 7b-it using quantization config #48

aliasneo1 · 2024-03-18T16:22:31Z

Newbie here.
7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .

quantization_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"/kaggle/input/gemma/transformers/7b-it/2",
device_map = "auto",
trust_remote_code = True,
quantization_config=quantization_config,
)
Whether such type of loading is feasible in your current package?

pengchongjin · 2024-03-26T16:37:59Z

Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.

It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.

tilakrayal added the enhancement New feature or request label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to load 7b-it using quantization config #48

Is it possible to load 7b-it using quantization config #48

aliasneo1 commented Mar 18, 2024 •

edited

pengchongjin commented Mar 26, 2024

Is it possible to load 7b-it using quantization config #48

Is it possible to load 7b-it using quantization config #48

Comments

aliasneo1 commented Mar 18, 2024 • edited

pengchongjin commented Mar 26, 2024

aliasneo1 commented Mar 18, 2024 •

edited