Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to load 7b-it using quantization config #48

Open
aliasneo1 opened this issue Mar 18, 2024 · 1 comment
Open

Is it possible to load 7b-it using quantization config #48

aliasneo1 opened this issue Mar 18, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@aliasneo1
Copy link

aliasneo1 commented Mar 18, 2024

Newbie here.
7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .

quantization_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"/kaggle/input/gemma/transformers/7b-it/2",
device_map = "auto",
trust_remote_code = True,
quantization_config=quantization_config,
)
Whether such type of loading is feasible in your current package?

@pengchongjin
Copy link
Collaborator

Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.

It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.

@tilakrayal tilakrayal added the enhancement New feature or request label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants