Skip to content

Latest commit

 

History

History

torch_streamingllm

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

🪅 Benchmark Original StremaingLLM

🔗 Table of Contents

📌 Overview

The original StreamingLLM provides a PyTorch implementation of StreamingLLM. It contains an exmaple script to showcase its generation quality. However, the script does not provide any system metrics to evaluate how fast the model can generate text. We modify the example script to add some metrics to evaluate its performance.

🚗 Quick Start

You can use the following command to install StreamingLLM easily.

# create a new conda env
conda create -yn streaming python=3.8
conda activate streaming

# install torch and related deps
pip install torch torchvision torchaudio
pip install transformers==4.33.0 accelerate datasets evaluate wandb scikit-learn scipy sentencepiece

# install streamingllm
# we fixed the commit for reproducibility
pip install git+https://github.com/mit-han-lab/streaming-llm.git@26b72ffa944c476a7a3c5efdfab6a9b49016aaac

You are then ready to run the benchmark script to evaluate the performance of the PyTorch version of StreamingLLM. Note than you need to replace <model-dir> with the actual path to the Hugging Face model repository as mentioned in the root README.

python run_streaming_llama.py \
--model_name_or_path <model-dir> \
--enable_streaming \
--max_output_len 1024 \
--max_input_len 1024 \
--start_size 4 \
--only_n_first 5

You can tune the arguments to evaluate the performance.

  • start_size: the number of initial tokens to retain in the window
  • max_output_len: the maximum number of tokens to be generated
  • only_n_first: the number of rounds of conversation to run through, you can remove this if you want to test all converstaion data.