Skip to content

EricLBuehler/candle-vllm

Repository files navigation

candle vLLM

Continuous integration Discord server

Efficient, easy-to-use platform for inference and serving local LLMs including an OpenAI compatible API server.

PPlease see mistral.rs, efficient inference platform for many models, including quantized support. Additionally, it implements X-LoRA, recently released method here. X-LoRA introduces a MoE inspired method to densely gate LoRA adapters powered by a model self-reflection forward pass.

candle-vllm is flux, in breaking development and as such is currently unstable.

Features

  • OpenAI compatible API server provided for serving LLMs.
  • Highly extensible trait-based system to allow rapid implementation of new module pipelines,
  • Streaming support in generation.
  • Efficient management of key-value cache with PagedAttention.
  • Continuous batching.

Pipelines

  • Llama
    • 7b
    • 13b
    • 70b
  • Mistral
    • 7b

Examples

See this folder for some examples.

Example with Llama 7b

In your terminal, install the openai Python package by running pip install openai. I use version 1.3.5.

Then, create a new Python file and write the following code:

import openai

openai.api_key = "EMPTY"

openai.base_url = "http://localhost:2000/v1/"

completion = openai.chat.completions.create(
    model="llama7b",
    messages=[
        {
            "role": "user",
            "content": "Explain how to best learn Rust.",
        },
    ],
    max_tokens = 64,
)
print(completion.choices[0].message.content)

Next, launch a candle-vllm instance by running cargo run --release -- --port 2000 llama7b --repeat-last-n 64.

After the candle-vllm instance is running, run the Python script and enjoy efficient inference with an OpenAI compatible API server!

Usage Help

For general configuration help, run cargo run -- --help.

For model-specific help, run cargo run -- --port 1234 <MODEL NAME> --help

Installation

Installing candle-vllm is as simple as the following steps. If you have any problems, please create an issue.

  1. Be sure to install Rust here: https://www.rust-lang.org/tools/install
  2. Run sudo apt install libssl-dev or equivalent install command
  3. Run sudo apt install pkg-config or equivalent install command

Contributing

The following features are planned to be implemented, but contributions are especially welcome:

Resources

About

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published