Carbon Limiting Auto Tuning for Kubernetes
-
Updated
May 18, 2024 - Python
Carbon Limiting Auto Tuning for Kubernetes
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
A REST API for vLLM, production ready
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Setup and run a local LLM and Chatbot using consumer grade hardware.
A large-scale simulation framework for LLM inference
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent,格式化输出,指令追随,长文本,多语言,代码,自定义任务的能力基准测试。
大模型推理框架加速,让 LLM 飞起来
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
Fully-featured, beautiful web interface for vLLM - built with NextJS.
This repository has a lot of LLM projects done. It is the best place to start learning LLM.
Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.
To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."