We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
部署OPENAI 服务后,模型一直输出重复的结果
服务运行命令:
CUDA_VISIBLE_DEVICES=0,3,4,5 NCCL_SHM_DISABLE=1 lmdeploy serve api_server /data/workspace/models/llava-v1.6-34b --server-port 23333 --tp 4 --cache-max-entry-count 0.5 --chat-template template.json
template.json文件内容:
{ "model_name": "llava-chatml", "system": "system\n", "meta_instruction": "You are a robot developed by lmdeploy.", "eosys": "\n", "user": "user\n", "eoh": "\n", "assistant": "assistant\n", "eoa": "", "separator": "\n", "capability": "chat", "stop_words": [""] }
程序运行代码:
from openai import OpenAI import sys from qwen_test import prompts client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1') model_name = client.models.list().data[0].id stream = True responses = client.chat.completions.create( model=model_name, messages=[{ 'role': 'user', 'content': [{ 'type': 'text', 'text': '描述一下这张图片', }, { 'type': 'image_url', 'image_url': { 'url': 'http://yanshi.jxgh.vip:8000/000000-zg119/upload/20240123/6f3a0494cc268f34b146031d73eaaaf8.jpeg', }, }], }], temperature=0.1, stream=stream, # top_p=0.8 ) if stream: for response in responses: result = response.choices[0].delta.content sys.stdout.write(result) sys.stdout.flush() else: print(responses.choices[0].message.content)
程序输出结果:
sys.platform: linux Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5: NVIDIA A100-PCIE-40GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 12.2, V12.2.140 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 2.2.1+cu121 PyTorch compiling details: PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX512 - CUDA Runtime 12.1 - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90 - CuDNN 8.9.2 - Magma 2.6.1 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, TorchVision: 0.17.1+cu121 LMDeploy: 0.4.1+398c2aa transformers: 4.40.2 gradio: 3.50.2 fastapi: 0.111.0 pydantic: 2.7.1 triton: 2.2.0
No response
The text was updated successfully, but these errors were encountered:
我感觉是生成参数的原因,把temperature=0.1注释掉,跑了很多次都没有重复的现象。
另外看你指定了template.json,改了meta_instruction,但是为什么要把<|im_start|>, <|im_end|>这些token去掉呢?
<|im_start|>
<|im_end|>
Sorry, something went wrong.
ollama 是怎么跑的? 也是自定义了相同的模版,相同的生成参数跑的么?
我感觉是生成参数的原因,把temperature=0.1注释掉,跑了很多次都没有重复的现象。 另外看你指定了template.json,改了meta_instruction,但是为什么要把<|im_start|>, <|im_end|>这些token去掉呢?
把<|im_start|>, <|im_end|>这些token加上我也测试过,一样的错误; 把temperature=0.1注释掉,没有出现重复的输出,很奇怪,为什么temperature=0.1的设置会导致重复的输出
ollama是使用相同的prompt、图片、参数,没有自定义模板,输出结果正常
irexyc
No branches or pull requests
Checklist
Describe the bug
部署OPENAI 服务后,模型一直输出重复的结果
Reproduction
服务运行命令:
template.json文件内容:
程序运行代码:
程序输出结果:
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: