An endpoint server for efficiently serving quantized open-source LLMs for code.
-
Updated
Oct 15, 2023 - Python
An endpoint server for efficiently serving quantized open-source LLMs for code.
Embedding based semantic search app for poetry [App and EDA notebooks]
Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.
This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.
EchoSight is a tool that helps visually impaired individuals by audibly describing images taken with a Raspberry Pi Camera or inputted via image path or URL across different operating systems.
Run code inference-only benchmarks quickly using vLLM
Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.
MLOps library for LLM deployment w/ the vLLM engine on RunPod's infra.
Low latency JSON generation using LLMs ⚡️
An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.
A discord bot which can call LLMs using either Hugging Face or vLLM on Windows platform. Combined with function calling.
Context layer on top of your unstructured universe
Call many AIs from a single API.
Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.
To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."