support "stop" in api chat/completions #3114

davidyao · 2024-04-03T05:51:05Z

CUDA_VISIBLE_DEVICES=0
USE_MODELSCOPE_HUB=1
API_PORT=7860
python src/api_demo.py
--model_name_or_path qwen/Qwen-72B-Chat-Int4
--template qwen

openai的chat completion 接口支持 stop指令，可以用来做 early stop。但是现在的接口好像不支持。希望能支持一下以减少不必要的推理

No response

No response

JieShenAI · 2024-04-12T09:34:02Z

"do_sample": false,
  "temperature": 0.0,
  "top_p": 0,
  "n": 1,
  "max_tokens": 128,
  "stream": false,
  "stop": "<|endoftext|>"

我在API 请求中，设置了 stop，也是没有生效；直到达到了模型生成的最大长度后，才停止生成。

hiyouga · 2024-04-12T09:44:49Z

@JieShenAI 还没支持。

hiyouga added the pending This problem is yet to be addressed. label Apr 3, 2024

zhaonx mentioned this issue Apr 30, 2024

"add support for vllm api stop parameter" #3527

Merged

1 task

hiyouga closed this as completed in #3527 May 6, 2024

hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels May 6, 2024

Provide feedback