Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf #1288

Open
brent-elliott opened this issue Feb 19, 2024 · 2 comments
Assignees
Labels

Comments

@brent-elliott
Copy link

Is there a specific version of openai that is aligned with the OpenAI interfaces offered by neuralchat? I am currently testing using the current 1.12.0 but encountering a 422 Unprocessable Entity error.

I saw that meta-llama/Llama-2-7b-chat-hf is a supported model and appears to be small enough to fit into my Intel Data Center Flex 170 XPU.

I can successfully run this model locally with the code outlined in deploy_chatbot_on_xpu.

However, when I attempt to use the OpenAI interface per the instructions at https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat, the server shows 422 Unprocessable Entity and the client gets an error about a missing value. I am assuming this relates to a mismatch between the OpenAI client and the neural_chat server in terms of the required fields. I have also included the text extracted from the tcpdump below.

Following along from the notebook examples, I have prepared textbot.yaml and server.py as below.

Starting the server

$ grep -v "^#" textbot.yaml | grep -v "^$"
host: 0.0.0.0
port: 8000
model_name_or_path: "meta-llama/Llama-2-7b-chat-hf"
device: "xpu"
tasks_list: ['textchat']

$ cat server.py
#!/usr/bin/env python

import os
import time
import multiprocessing
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
import nest_asyncio

nest_asyncio.apply()

def start_service():
    server_executor = NeuralChatServerExecutor()
    server_executor(config_file="textbot.yaml", log_file="neuralchat.log")
multiprocessing.Process(target=start_service).start()

$ ./server.py
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Loading config settings from the environment...
2024-02-19 14:11:22.837584: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-19 14:11:22.841047: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.887207: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 14:11:22.887246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 14:11:22.888669: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 14:11:22.896900: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.897194: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-19 14:11:23.782914: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-19 14:11:27,327 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
2024-02-19 14:11:27,328 - datasets - INFO - TensorFlow version 2.15.0.post1 available.
Loading model meta-llama/Llama-2-7b-chat-hf
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.25it/s]
2024-02-19 14:11:31,912 - root - INFO - Model loaded.
INFO:     Started server process [2913373]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Additional logs after starting the TextChatClientExecutor client - successful inference

[2024-02-19 14:32:57,683] [    INFO] - Checking parameters of completion request...
[2024-02-19 14:32:57,683] [    INFO] - Predicting chat completion using prompt 'Tell me about Intel Xeon Scalable Processors.'
[2024-02-19 14:33:07,119] [    INFO] - Chat completion finished.
INFO:     127.0.0.1:60734 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Additional logs after connecting via OpenAI - failing access

INFO:     127.0.0.1:39368 - "POST /v1/chat/completions HTTP/1.1" 422 Unprocessable Entity

Open AI Client contents

Aside from the shebang and the modified model string, this should be identical to the content on the webpage.

$ cat openai-client.py
#!/usr/bin/env python

import openai
openai.api_key = "EMPTY"
openai.base_url = 'http://127.0.0.1:8000/v1/'

response = openai.chat.completions.create(
      model="meta-llama/Llama-2-7b-chat-hf",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."},
      ],
)
print(response.choices[0].message.content)

$ ./openai-client.py
Traceback (most recent call last):
  File "/home/REDACTED/jupyter/./openai-client.py", line 7, in <module>
    response = openai.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 663, in create
    return self._post(
           ^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 1200, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 889, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 980, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'prompt'], 'msg': 'field required', 'type': 'value_error.missing'}]}

Text from packet capture of exchange

POST /v1/chat/completions HTTP/1.1
Host: REDACTED:8000
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept: application/json
Content-Type: application/json
User-Agent: _ModuleClient/Python 1.12.0
X-Stainless-Lang: python
X-Stainless-Package-Version: 1.12.0
X-Stainless-OS: Linux
X-Stainless-Arch: x64
X-Stainless-Runtime: CPython
X-Stainless-Runtime-Version: 3.11.7
Authorization: Bearer EMPTY
X-Stainless-Async: false
Content-Length: 197

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}], "model": "meta-llama/Llama-2-7b-chat-hf"}

HTTP/1.1 422 Unprocessable Entity
date: Mon, 19 Feb 2024 23:02:02 GMT
server: uvicorn
content-length: 90
content-type: application/json

{"detail":[{"loc":["body","prompt"],"msg":"field required","type":"value_error.missing"}]}

Thank you!

@hshen14
Copy link
Contributor

hshen14 commented Feb 19, 2024

Thanks @brent-elliott for reporting the issue. @huiyan2021 @lvliang-intel please take a look.

@huiyan2021
Copy link
Contributor

Hi @brent-elliott, this sample needs to be run using the latest main branch of intel-extension-for-transformers since there are some message format changes in openai API. Also with #1289 and #1294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants