Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unrecognized configuration class when quantizing llava #1601

Closed
2 tasks done
zjysteven opened this issue May 16, 2024 · 5 comments
Closed
2 tasks done

[Bug] Unrecognized configuration class when quantizing llava #1601

zjysteven opened this issue May 16, 2024 · 5 comments
Assignees

Comments

@zjysteven
Copy link

zjysteven commented May 16, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

When running the w4a16 quantization for llava models, the transformers==4.42.0 library will raise the Unrecognized configuration class error, i.e., the llava model class is not registered in transformers and thus cannot be found. I know this is not really a bug with lmdeploy itself. I've seen the exact same issue reported in other repos (e.g. here), where the suggestion was to use transformers==4.31.0; yet it didn't help.

I was also surprised that no one else raised this issue and there seem to be plenty people succeeded in quantizing llava models. Thus by opening this issue I want to see if there's anything wrong on my side.

Note that below I was trying to quantize lmms-lab/llama3-llava-next-8b, but the same error was also there if changing it to liuhaotian/llava-v1.5-7b.

What I've tried

I tried switching transformers version between 4.31.0, the latest 4.42.0, and the one specified by llava authors transformers@ git+https://github.com/huggingface/transformers.git@1c39974a4c4036fd641bc1191cc32799f85715a4; yet none of them worked. This is kind of expected because regardless of the transformers version I'd expect some manual registration performed like here?

Reproduction

export HF_MODEL=lmms-lab/llama3-llava-next-8b
export WORK_DIR=awq/llama3-llava-next-8b-4bit

lmdeploy lite auto_awq \
    $HF_MODEL \
    --calib-dataset 'c4' \
    --calib-samples 512 \
    --calib-seqlen 1024 \
    --w-bits 4 \
    --w-group-size 128 \
    --work-dir $WORK_DIR

Environment

sys.platform: linux
Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA L40S
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

LMDeploy: 0.4.1+
transformers: 4.40.2
gradio: 3.50.2
fastapi: 0.111.0
pydantic: 2.7.1
triton: 2.2.0

Error traceback

Traceback (most recent call last):
  File "/home/jz288/anaconda3/envs/lmd/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/lmdeploy/cli/lite.py", line 131, in auto_awq
    auto_awq(**kwargs)
  File "/home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/lmdeploy/lite/apis/auto_awq.py", line 55, in auto_awq
    model, tokenizer, work_dir = calibrate(model, calib_dataset, calib_samples,
  File "/home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/lmdeploy/lite/apis/calibrate.py", line 152, in calibrate
    model = load_hf_from_pretrained(model,
  File "/home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/lmdeploy/lite/utils/load.py", line 31, in load_hf_from_pretrained
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, JambaConfig, LlamaConfig, MambaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, OlmoConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.
@AllentDan
Copy link
Collaborator

AllentDan commented May 16, 2024

The quantization of vl models is not supported until #1553 get merged. May try the PR directly if you are in a hurry.

@zjysteven
Copy link
Author

@AllentDan Thanks for your reply. I have two follow-up questions and would appreciate further confirmation.

  1. I'm not sure if I can build from source on my server to include the PR, thus I'm wondering if there is a rough expected time for the new release that will support VLM quantization?
  2. I saw several issues mentioning they have successfully quantized llava models (or other VLMs), see for example [Bug] Error when trying to load awq llava 1.5 13b model #1511, which was about 3 weeks ago. I'm wondering if using an older version of pre-built lmdeploy may possibly work?

@AllentDan
Copy link
Collaborator

  1. The next version of lmdeploy will be released in two weeks.
  2. yes, you may try the steps in other issues. Or, you can also try just modifying your locally installed lmdeploy package according to my PR. Since you only want llava, there should be a limited number of files to change. For example, the error log above indicates that you should modify the loading logic in /home/jz288/anaconda3/envs/lmd/lib/python3.10/site-packages/lmdeploy/lite/utils/load.py

@zjysteven
Copy link
Author

Thank you very much. Will try. If it's ok I'd like to keep this issue open just for now.

@AllentDan
Copy link
Collaborator

Supported in the latest main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants