Awesome Local AI

If you tried Jan Desktop and liked it, please also check out the following awesome collection of open source and/or local AI tools and solutions.

Your contributions are always welcome!

Lists

awesome-local-llms - Table of open-source local LLM inference projects with their GitHub metrics.
llama-police - A list of Open Source LLM Tools from Chip Huyen

Inference Engine

Repository	Description	Supported model formats	CPU/GPU Support	UI	language	Platform Type
llama.cpp	- Inference of LLaMA model in pure C/C++	GGML/GGUF	Both	❌	C/C++	Text-Gen
Nitro	- 3MB inference engine embeddable in your apps. Uses Llamacpp and more	Both	Both	❌	Text-Gen
ollama	- CLI and local server. Uses Llamacpp	Both	Both	❌	Text-Gen
koboldcpp	- A simple one-file way to run various GGML models with KoboldAI's UI	GGML	Both	✅	C/C++	Text-Gen
LoLLMS	- Lord of Large Language Models Web User Interface.	Nearly ALL	Both	✅	Python	Text-Gen
ExLlama	- A more memory-efficient rewrite of the HF transformers implementation of Llama	AutoGPTQ/GPTQ	GPU	✅	Python/C++	Text-Gen
vLLM	- vLLM is a fast and easy-to-use library for LLM inference and serving.	GGML/GGUF	Both	❌	Python	Text-Gen
SGLang	- 3-5x higher throughput than vLLM (Control flow, RadixAttention, KV cache reuse)	Safetensor / AWQ / GPTQ	GPU	❌	Python	Text-Gen
LmDeploy	- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.	Pytorch / Turbomind	Both	❌	Python/C++	Text-Gen
Tensorrt-llm	- Inference efficiently on NVIDIA GPUs	Python / C++ runtimes	Both	❌	Python/C++	Text-Gen
CTransformers	- Python bindings for the Transformer models implemented in C/C++ using GGML library	GGML/GPTQ	Both	❌	C/C++	Text-Gen
llama-cpp-python	- Python bindings for llama.cpp	GGUF	Both	❌	Python	Text-Gen
llama2.rs	- A fast llama2 decoder in pure Rust	GPTQ	CPU	❌	Rust	Text-Gen
ExLlamaV2	- A fast inference library for running LLMs locally on modern consumer-class GPUs	GPTQ/EXL2	GPU	❌	Python/C++	Text-Gen
LoRAX	- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs	Safetensor / AWQ / GPTQ	GPU	❌	Python/Rust	Text-Gen
text-generation-inference	- Inference serving toolbox with optimized kernels for each LLM architecture	Safetensors / AWQ / GPTQ	Both	❌	Python/Rust	Text-Gen

Inference UI

oobabooga - A Gradio web UI for Large Language Models.
LM Studio - Discover, download, and run local LLMs.
LocalAI - LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
FireworksAI - Experience the world's fastest LLM inference platform deploy your own at no additional cost.
faradav - Chat with AI Characters Offline, Runs locally, Zero-configuration.
GPT4All - A free-to-use, locally running, privacy-aware chatbot.
LLMFarm - llama and other large language models on iOS and MacOS offline using GGML library.
LlamaChat - LlamaChat allows you to chat with LLaMa, Alpaca and GPT4All models1 all running locally on your Mac.
LLM as a Chatbot Service - LLM as a Chatbot Service.
FuLLMetalAi - Fullmetal.Ai is a distributed network of self-hosted Large Language Models (LLMs).
Automatic1111 - Stable Diffusion web UI.
ComfyUI - A powerful and modular stable diffusion GUI with a graph/nodes interface.
Wordflow - Run, share, and discover AI prompts in your browsers
petals - Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading.
ChatUI - Open source codebase powering the HuggingChat app.
AI-Mask - Browser extension to provide model inference to web apps. Backed by web-llm and transformers.js
everything-rag - Interact with (virtually) any LLM on Hugging Face Hub with an asy-to-use, 100% local Gradio chatbot.
LmScript - UI for SGLang and Outlines

Platforms / full solutions

H2OAI - H2OGPT The fastest, most accurate AI Cloud Platform.
BentoML - BentoML is a framework for building reliable, scalable, and cost-efficient AI applications.
Predibase - Serverless LoRA Fine-Tuning and Serving for LLMs.

Developer tools

Jan Framework - At its core, Jan is a cross-platform, local-first and AI native application framework that can be used to build anything.
Pinecone - Long-Term Memory for AI.
PoplarML - PoplarML enables the deployment of production-ready, scalable ML systems with minimal engineering effort.
Datature - The All-in-One Platform to Build and Deploy Vision AI.
One AI - MAKING GENERATIVE AI BUSINESS-READY.
Gooey.AI - Create Your Own No Code AI Workflows.
Mixo.io - AI website builder.
Safurai - AI Code Assistant that saves you time in changing, optimizing, and searching code.
GitFluence - The AI-driven solution that helps you quickly find the right command. Get started with Git Command Generator today and save time.
Haystack - A framework for building NLP applications (e.g. agents, semantic search, question-answering) with language models.
LangChain - A framework for developing applications powered by language models.
gpt4all - A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.
LMQL - LMQL is a query language for large language models.
LlamaIndex - A data framework for building LLM applications over external data.
Phoenix - Open-source tool for ML observability that runs in your notebook environment, by Arize. Monitor and fine tune LLM, CV and tabular models.
trypromptly - Create AI Apps & Chatbots in Minutes.
BentoML - BentoML is the platform for software engineers to build AI products.
LiteLLM - Call all LLM APIs using the OpenAI format.

Agents

SuperAGI - Opensource AGI Infrastructure.
Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
BabyAGI - Baby AGI is an autonomous AI agent developed using Python that operates through OpenAI and Pinecone APIs.
AgentGPT -Assemble, configure, and deploy autonomous AI Agents in your browser.
HyperWrite - HyperWrite helps you work smarter, faster, and with ease.
AI Agents - AI Agent that Power Up Your Productivity.
AgentRunner.ai - Leverage the power of GPT-4 to create and train fully autonomous AI agents.
GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
GPT Prompt Engineer - Automated prompt engineering. It generates, tests, and ranks prompts to find the best ones.
MetaGPT - The Multi-Agent Framework: Given one line requirement, return PRD, design, tasks, repo.
Open Interpreter - Let language models run code. Have your agent write and execute code.
CrewAI - Cutting-edge framework for orchestrating role-playing, autonomous AI agents.

Training

FastChat - An open platform for training, serving, and evaluating large language models.
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
BMTrain - Efficient Training for Big Models.
Alpa - Alpa is a system for training and serving large-scale neural networks.
Megatron-LM - Ongoing research training transformer models at scale.
Ludwig - Low-code framework for building custom LLMs, neural networks, and other AI models.
Nanotron - Minimalistic large language model 3D-parallelism training.
TRL - Language model alignment with reinforcement learning.
PEFT - Parameter efficient fine-tuning (LoRA, DoRA, model merger and more)

LLM Leaderboard

Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released.
Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
AlpacaEval Leaderboard - An Automatic Evaluator for Instruction-following Language Models.
LLM-Leaderboard-streamlit - A joint community effort to create one central leaderboard for LLMs.
lmsys.org - Benchmarking LLMs in the Wild with Elo Ratings.

Research

Attention Is All You Need (2017): Presents the original transformer model. it helps with sequence-to-sequence tasks, such as machine translation. [Paper]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018): Helps with language modeling and prediction tasks. [Paper]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022): Mechanism to improve transformers. [paper]
Improving Language Understanding by Generative Pre-Training (2019): Paper is authored by OpenAI on GPT. [paper]
Cramming: Training a Language Model on a Single GPU in One Day (2022): Paper focus on a way too increase the performance by using minimum computing power. [paper]
LaMDA: Language Models for Dialog Applications (2022): LaMDA is a family of Transformer-based neural language models by Google. [paper]
Training language models to follow instructions with human feedback (2022): Use human feedback to align LLMs. [paper]
TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) [paper]
Fast Distributed Inference Serving for Large Language Models (arXiv'23) [paper]
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (arXiv'23) [paper]
Accelerating LLM Inference with Staged Speculative Decoding (arXiv'23) [paper]
ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) [paper]
TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition 2023 [Paper]

Community

LocalLLaMA
singularity
ChatGPTCoding
StableDiffusion
Hugging Face
JanAI
oobabooga
GPT4
Artificial Intelligence
CrewAI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Local AI

Lists

Inference Engine

Inference UI

Platforms / full solutions

Developer tools

User Tools

Agents

Training

LLM Leaderboard

Research

Community

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Local AI

Lists

Inference Engine

Inference UI

Platforms / full solutions

Developer tools

User Tools

Agents

Training

LLM Leaderboard

Research

Community