vllm

Star

Here are 49 public repositories matching this topic...

Climatik-Project / Climatik-Project

Star

Carbon Limiting Auto Tuning for Kubernetes

kubernetes sustainability kepler kubernetes-operator power-capping green-computing keda kserve llm vllm llm-inference

Updated May 18, 2024
Python

OpenLLMAI / OpenRLHF

Star

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

reinforcement-learning raylib transformers deepspeed large-language-models reinforcement-learning-from-human-feedback vllm

Updated May 18, 2024
Python

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated May 17, 2024
Jupyter Notebook

bricks-cloud / BricksLLM

Star

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated May 17, 2024
Go

xorbitsai / inference

Star

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Updated May 17, 2024
Python

OSS-Pole-Emploi / happy_vllm

Star

A REST API for vLLM, production ready

production transformers api-rest serving mlops llm llm-serving vllm

Updated May 17, 2024
Python

OpenCSGs / llm-inference

Star

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

transformer ray deepspeed llama-cpp vllm llm-inference

Updated May 17, 2024
Python

AgnostiqHQ / tutorials_covalent_pycon_2024

Star

ai hpc gpu ml llama covalent agents autonomous-agents huggingface large-language-models llm chatgpt llamacpp vllm ai-foundry

Updated May 16, 2024
Jupyter Notebook

jasonacox / TinyLLM

Star

Setup and run a local LLM and Chatbot using consumer grade hardware.

chatbot artificial-intelligence openai rag large-language-models llm vllm retrieval-augmented-generation llama-cpp-python

Updated May 16, 2024
JavaScript

microsoft / vidur

Star

A large-scale simulation framework for LLM inference

simulation inference transformer llm vllm

Updated May 15, 2024
Python

runpod-workers / worker-vllm

Star

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

language-model llm runpod vllm

Updated May 15, 2024
Python

gotzmann / booster

Star

Booster - open platform for serving LLM models

openai llama gpt alpaca vicuna koboldai llm chatgpt open-assistant llamacpp llama-cpp vllm ggml stablelm wizardlm exllama oobabooga

Updated May 18, 2024
C++

DefTruth / Awesome-LLM-Inference

Star

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated May 15, 2024

EvilPsyCHo / Open-LLM-Benchmark

Star

Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

openai evaluation-framework huggingface large-language-models llamacpp vllm llm-agent llms-benchmarking