GitHub - EricLBuehler/candle-vllm: Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Efficient, easy-to-use platform for inference and serving local LLMs including an OpenAI compatible API server.

PPlease see mistral.rs, efficient inference platform for many models, including quantized support. Additionally, it implements X-LoRA, recently released method here. X-LoRA introduces a MoE inspired method to densely gate LoRA adapters powered by a model self-reflection forward pass.

candle-vllm is flux, in breaking development and as such is currently unstable.

Features

OpenAI compatible API server provided for serving LLMs.
Highly extensible trait-based system to allow rapid implementation of new module pipelines,
Streaming support in generation.
Efficient management of key-value cache with PagedAttention.
Continuous batching.

Pipelines

Llama
- 7b
- 13b
- 70b
Mistral
- 7b

Examples

See this folder for some examples.

Example with Llama 7b

In your terminal, install the openai Python package by running pip install openai. I use version 1.3.5.

Then, create a new Python file and write the following code:

import openai

openai.api_key = "EMPTY"

openai.base_url = "http://localhost:2000/v1/"

completion = openai.chat.completions.create(
    model="llama7b",
    messages=[
        {
            "role": "user",
            "content": "Explain how to best learn Rust.",
        },
    ],
    max_tokens = 64,
)
print(completion.choices[0].message.content)

Next, launch a candle-vllm instance by running cargo run --release -- --port 2000 llama7b --repeat-last-n 64.

After the candle-vllm instance is running, run the Python script and enjoy efficient inference with an OpenAI compatible API server!

Usage Help

For general configuration help, run cargo run -- --help.

For model-specific help, run cargo run -- --port 1234 <MODEL NAME> --help

Installation

Installing candle-vllm is as simple as the following steps. If you have any problems, please create an issue.

Be sure to install Rust here: https://www.rust-lang.org/tools/install
Run sudo apt install libssl-dev or equivalent install command
Run sudo apt install pkg-config or equivalent install command

Contributing

The following features are planned to be implemented, but contributions are especially welcome:

Sampling methods:
- Beam search (huggingface/candle#1319)
More pipelines (from candle-transformers)

Resources

Python implementation: vllm-project
vllm paper

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
examples		examples
kernels		kernels
res		res
src		src
tests		tests
.gitignore		.gitignore
.typos.toml		.typos.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
setup.py		setup.py

License

EricLBuehler/candle-vllm

Folders and files

Latest commit

History

Repository files navigation

Features

Pipelines

Examples

Example with Llama 7b

Usage Help

Installation

Contributing

Resources

About

Resources

License

Stars

Watchers

Forks

Languages