Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lm-eval for llama.cpp enhancement. #1543

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lkk12014402
Copy link
Collaborator

@lkk12014402 lkk12014402 commented May 12, 2024

Type of Change

enable lm-eval for llama.cpp models

API not changed

Description

refer to the lm-eval official code and llama-cpp-python

improvements:

  1. load llama.cpp model directly when do lm-eval (the official code needs launch a llama.cpp server)
  2. For qwen models, revise the detokenize func because some error occurs during evaluation and force to add bos_id for qwen models because the llama-cpp-python doesn't add bos_id successfully. Even though some changes for qwen, I still find that the tokenizer results are different between llama.cpp and huggingface/transformers. I will verify this further.
  3. As describe in the comments at llama-cpp-python, I implement it with a custom class, which can accelerate the post-process.

Copy link

github-actions bot commented May 12, 2024

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🔴 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) failure download
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🔴 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test failure download
Genreate-OptimizeUT-Report skipped

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

@lkk12014402
Copy link
Collaborator Author

lkk12014402 commented May 12, 2024

usages:

CPU

model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
        model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
        device = "cpu",
        tasks = "hellaswag",
        batch_size = 2,
        limit = 10)
results = evaluate(eval_args)

print(results["results"])

GPU

model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
        model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
        device = "cuda",
        tasks = "hellaswag",
        batch_size = 2,
        limit = 10)
results = evaluate(eval_args)

print(results["results"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants