Skip to content

ExLlamaV2 nodes for ComfyUI.

License

Notifications You must be signed in to change notification settings

Zuellni/ComfyUI-ExLlama-Nodes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComfyUI ExLlamaV2 Nodes

A simple local text generator for ComfyUI using ExLlamaV2.

Installation

Clone the repository to custom_nodes:

git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes

Install the requirements:

pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements.txt

On Windows, install one of the precompiled wheels instead:

pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.xx/exllamav2-0.0.xx+cuXXX-cpXXX-cpXXX-win_amd64.whl

Check which one you need with:

python -c "import sys, torch; print(f'cu{torch.version.cuda.replace('.', '')}-cp{sys.version_info[0]}{sys.version_info[1]}')"

Caution

If you see errors related to ExLlamaV2 while loading the nodes, try to install it following the official instructions.

Usage

Only EXL2, 4-bit GPTQ, and unquantized HF models are supported. You can find them on Hugging Face. See the model card in each repository for details on instruction formats.

To use a model with the nodes, you should clone its repository with git or manually download all the files and place them in models/llm. For example, if you'd like to download the 6-bit Llama-3-8B-Instruct, use the following command:

git install lfs
git clone https://huggingface.co/turboderp/Llama-3-8B-Instruct-exl2 -b 6.0bpw models/llm/Llama-3-8B-Instruct-exl2-6.0bpw

Tip

You can add your own llm path to the extra_model_paths.yaml file and put the models there instead.

Nodes

Loader Loads models from the llm directory.
cache_bits Lower value equals lower VRAM usage but also impacts generation speed and quality.
max_seq_len Max context, higher value equals higher VRAM usage. 0 will default to config.
Generator Generates text based on the given prompt. Refer to text-generation-webui for parameters.
unload Unloads the model after each generation.
single_line Stops the generation on newline.
max_tokens Max new tokens, 0 will use available context.
Previewer Displays generated text in the UI.
Replacer Replaces variable names enclosed in brackets, eg [a], with their values.

Workflow

The example workflow is embedded in the image below and can be opened in ComfyUI.

workflow