📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
-
Updated
May 15, 2024
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
🚀 DeepSeek-V2大模型逆向API白嫖测试【特长:GPT4平替】,支持高速流式输出、多轮对话,零配置部署,多路token支持。
Add a description, image, and links to the deepseek topic page so that developers can more easily learn about it.
To associate your repository with the deepseek topic, visit your repo's landing page and select "manage topics."