Skip to content

0.3.2

Latest
Compare
Choose a tag to compare
@alpayariyak alpayariyak released this 12 Mar 23:11
· 12 commits to main since this release
cee4e48

Worker vLLM 0.3.2 - What's Changed

  • vLLM version 0.3.2 -> 0.3.3
    • StarCoder2 support
    • Performance optimization for Gemma
    • 2/3/8-bit GPTQ support
    • Integrate Marlin Kernels for Int4 GPTQ inference
    • Performance optimization for MoE kernel
  • Updated and refactored base image, sampling parameters, etc.
  • Various bug fixes