You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Newbie here.
7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .
quantization_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"/kaggle/input/gemma/transformers/7b-it/2",
device_map = "auto",
trust_remote_code = True,
quantization_config=quantization_config,
)
Whether such type of loading is feasible in your current package?
The text was updated successfully, but these errors were encountered:
Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.
It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.
Newbie here.
7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .
quantization_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"/kaggle/input/gemma/transformers/7b-it/2",
device_map = "auto",
trust_remote_code = True,
quantization_config=quantization_config,
)
Whether such type of loading is feasible in your current package?
The text was updated successfully, but these errors were encountered: