Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLXServer doesn't appear to actually support OpenAI style API calls as suggested in readme #3

Open
sejmann opened this issue Mar 23, 2024 · 7 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@sejmann
Copy link

sejmann commented Mar 23, 2024

It's possible I'm very confused, or something with my install went awry, but contra to the readme.md for PicoMXServer, when I launch an MLXServer endpoint (via PicoMXServer) and try to connect an OpenAPI client to it, the server logs show a 404 return.

127.0.0.1 - - [23/Mar/2024 11:54:32] "�[33mPOST /v1/chat/completions HTTP/1.1�[0m" 404 -

however, the curl you suggest to http://127.0.0.1:8080/generate?prompt works fine.

As it appears PicoMLXServer is a thin wrapper/orchestrator around MLXServer, which does the listening and serving, I checked MLXServer to make sure it actually supports OpenAI's API, and saw an open feature request issue from a week ago asking for OpenAI style API support, but has received no answer. mustafaaljadery/mlxserver#2 – in browsing the MLXServer source, I see no support for OAI style apis. I know you've written an OpenAPI proxy in another project, so perhaps support is yet coming?

Maybe https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/SERVER.md is a good starting point?

@sejmann sejmann changed the title MLXServer doesn't appear to actually support OpenAI style API api calls as suggested in readme MLXServer doesn't appear to actually support OpenAI style API calls as suggested in readme Mar 23, 2024
@ronaldmannak ronaldmannak self-assigned this Mar 23, 2024
@ronaldmannak ronaldmannak added bug Something isn't working enhancement New feature or request labels Mar 23, 2024
@M4RT1NJB
Copy link

I got confused by this also, which is why I didn't create a new issue as I was using PicoMLXServer incorrectly.
As PicoMLXServer is based on MLXServer, it can only support the APIs that this supports, and the current OpenAI API /v1/chat/completions is not one of them.

If you're just after an MLX OpenAI API server that is written in Swift you could take a look at swift-mlx-server. The mlx-examples example that you gave also works, but needs modification to increase max_tokens etc. (as does the swift example I gave).

@sejmann
Copy link
Author

sejmann commented Mar 25, 2024

Right, okay, so swift-mlx-server supports /v1/completions API taking a prompt property, but not the /v1/chat/completions api taking an array of messages. MLXServer takes it's own api but neither openAI's /v1/completions nor /v1/chat/completions APIs.

I had tried to swap in swift-mlx-server by updating /Applications/PicoMLXServer.app/Contents/Resources/server.py to the below (which works) before realizing swift-mlx-server also didn't support chat with the newer api. (Still useful, I guess, if I just needed completions, but couldn't just call /generate for some weird reason.)

import sys
import subprocess

# Path to the executable
# note, this requires copying default.metallib to /usr/local/bin/ as mlx.metallib
executable_path = "/usr/local/bin/swift-mlx-server"

# Default model name and port
default_model = "mlx-community/Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX"
default_port = 5000

# Set default values for model and port
model_name = default_model
port = default_port

# Process command-line arguments
for arg in sys.argv[1:]:
    if arg.startswith("model="):
        _, model_name = arg.split("=", 1)
    elif arg.startswith("port="):
        _, port = arg.split("=", 1)
    else:
        print("Invalid argument. Please use the format 'model=model_name' or 'port=port_number'.")

# Build the command to run the executable with the necessary arguments
command = [
    executable_path,
    "--model", model_name,
    "--port", str(port),
    "--host", "127.0.0.1"
]

# Execute the command
try:
    subprocess.run(command, check=True)
except subprocess.CalledProcessError as e:
    print(f"Error executing '{executable_path}': {e}")

Incidentally, are there Mac chat clients that don't rely on /v1/chat/completions? I'm using MindMac against LM Studio at present, but LM Studio feels heavier, only supports one model served at a time. That's why pico here appealed to me. Light weight, living in the menu bar, and MLX instead of llama.cpp. As far as clients, I figure Ronald's Pico AI Chatbot would probably work, but it doesn't seem to be available anymore.

Anyway, I'm not sure how I would support chat/completions in PicoMLXServer. Maybe write it in swift with heavy inspiration from swift-mlx-server, but add routing for chat/completions? It's gotta be just something like syntactic sugar for /v1/completions, just to get the context formatting right, I would think?

@ronaldmannak
Copy link
Collaborator

Sorry for the confusion everyone! In my hurry to launch Pico MLX Server I actually mixed up MLX Server and Swift MLX Server. I initially found Swift MLX Server, saw they claimed OpenAI compatibility, but somehow ended up including MLX Server in the project. 😂

@sejmann you are correct that MLX Server and therefore Pico MLX Server is indeed not compatible with OpenAI. I will update the readme.

I can definitely switch to (or add) Swift MLX Server, but as mentioned above, it only support the legacy https://api.openai.com/v1/completions API, not the updated https://api.openai.com/v1/chat/completions API. Unless Swift MLX Server is updated, I see limited value to adding Swift MLX Server. That said, having the current MLX Server custom non-standard API isn't useful either.

We need to support the newer https://api.openai.com/v1/chat/completions API so existing clients can use Pico MLX Server. And high on my list is support for https://api.openai.com/v1/embeddings embeddings API.

There are a few ways forward. First of all, we can wait for either MLX Server or Swift MLX Server to add support for https://api.openai.com/v1/chat/completions API.

A second option is to go back to my initial plan and use MLX-Swift and a web server package (and ditch the Python solutions). That has a few extra benefits as well, like improved error handling, more flexible model downloading, etc.

I will create a separate develop branch for the Swift-only version. I don't think it will be that hard (famous last words...) except for streaming perhaps (I don't know how that works server-side).

@M4RT1NJB
Copy link

My mistake, and you are correct. Swift MLX Server supports the legacy OpenAPI, not the updated one. I had realised this some time ago, then promptly forgot 🤦

@sejmann
Copy link
Author

sejmann commented Mar 26, 2024

Interestingly, MindMac claims to support PicoMLXServer, as of yesterday. =)
https://mindmac.canny.io/changelog?search=MLX -- I haven't yet tested; perhaps they support MLXServer's api.

update: yup, confirmed, it expects MLXServer's unique /chat api, and it seems to work fine with MindMac so far.

It makes me think that maybe a less "pico" PicoMLXServer could provide access to multiple underlying servers, and even an optional rewrite proxy to add oai api support to underlying servers, if necessary, and become sort of a universal translator. Although, maybe that's too much. Anyway, since I'll be able to use the app as sort of intended with a chat client, I guess I'll start opening issues related to day to day use. =)

@ronaldmannak
Copy link
Collaborator

Oh that's so cool. I hadn't noticed MindMac added support.

As for a "universal translator" proxy server, that's what I had in mind for Swift OpenAI Proxy Server, though that's meant for external and commercial APIs. What you have in mind is a reverse proxy for local models? Do I understand you correctly?

@ronaldmannak
Copy link
Collaborator

I've created a separate project for the handling the API and the conversion to MLX-Swift calls. Once I have a first successful roundtrip, I'll create a separate developer branch of MLXServer that will contain the web server and UI.

Does anyone have experience with tiny Swift http servers that support streaming by any chance? Vapor is definitely overkill, HummingBird as well (but at least lighter than Vapor). I'm hoping there's a small one that doesn't require Nio.

https://github.com/ronaldmannak/MLXkit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants