add chatglm3-6b model support [help wanted] #6999

mnlife · 2024-04-30T06:16:18Z

Text generation has been implemented.

The following features that I know of haven't been implemented.
Compare with PyTorch version:

The model input prefixes not include {"[gMASK]", "sop", "<|user|>", "_", "<0x0A>"}, while the suffix not includes {"<|assistant|>"}
- for example: when we input "hi", after tokenizer it is {"[gMASK]", "sop", "<|user|>", "_", "<0x0A>", "hi", "<|assistant|>"}. Implement this feature, what we need to changed in llama.cpp?
- when I add 9a8db6b, and exec below command，The changes have not taken effect.

./build/bin/main -m ~/models/chatglm3-6b-Q4_K_M.gguf --verbose-prompt -p 你好

The inference results are incorrect with the CUDA version.

below is some link about chatglm model
The Hugging Face model path for chatglm3-6b: https://huggingface.co/THUDM/chatglm3-6b
gguf model: https://modelscope.cn/api/v1/models/mnlife/chatglm3-6b-gguf/repo?Revision=master&FilePath=chatglm3-6b-Q4_K_M.gguf

github-actions · 2024-04-30T06:44:48Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 531 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8835.79ms p(95)=22264.85ms fails=, finish reason: stop=477 truncated=54
Prompt processing (pp): avg=105.84tk/s p(95)=462.75tk/s
Token generation (tg): avg=49.51tk/s p(95)=47.34tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=chatglm3 commit=ed1d3ffc2a97d4d7aff94e419b9701c14487f6c0

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715918485 --> 1715919111
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 333.24, 333.24, 333.24, 333.24, 333.24, 905.1, 905.1, 905.1, 905.1, 905.1, 926.01, 926.01, 926.01, 926.01, 926.01, 946.65, 946.65, 946.65, 946.65, 946.65, 982.74, 982.74, 982.74, 982.74, 982.74, 971.0, 971.0, 971.0, 971.0, 971.0, 971.94, 971.94, 971.94, 971.94, 971.94, 976.49, 976.49, 976.49, 976.49, 976.49, 982.4, 982.4, 982.4, 982.4, 982.4, 973.66, 973.66, 973.66, 973.66, 973.66, 963.02, 963.02, 963.02, 963.02, 963.02, 979.29, 979.29, 979.29, 979.29, 979.29, 982.45, 982.45, 982.45, 982.45, 982.45, 854.1, 854.1, 854.1, 854.1, 854.1, 852.73, 852.73, 852.73, 852.73, 852.73, 858.07, 858.07, 858.07, 858.07, 858.07, 858.53, 858.53, 858.53, 858.53, 858.53, 864.85, 864.85, 864.85, 864.85, 864.85, 877.03, 877.03, 877.03, 877.03, 877.03, 874.05, 874.05, 874.05, 874.05, 874.05, 871.14, 871.14, 871.14, 871.14, 871.14, 875.1, 875.1, 875.1, 875.1, 875.1, 875.38, 875.38, 875.38, 875.38, 875.38, 873.11, 873.11, 873.11, 873.11, 873.11, 871.65, 871.65, 871.65, 871.65, 871.65, 873.19, 873.19, 873.19, 873.19, 873.19, 873.77, 873.77, 873.77, 873.77, 873.77, 863.96, 863.96, 863.96, 863.96, 863.96, 863.2, 863.2, 863.2, 863.2, 863.2, 861.97, 861.97, 861.97, 861.97, 861.97, 865.27, 865.27, 865.27, 865.27, 865.27, 866.78, 866.78, 866.78, 866.78, 866.78, 864.77, 864.77, 864.77, 864.77, 864.77, 869.29, 869.29, 869.29, 869.29, 869.29, 878.79, 878.79, 878.79, 878.79, 878.79, 880.85, 880.85, 880.85, 880.85, 880.85, 884.25, 884.25, 884.25, 884.25, 884.25, 882.96, 882.96, 882.96, 882.96, 882.96, 876.59, 876.59, 876.59, 876.59, 876.59, 875.23, 875.23, 875.23, 875.23, 875.23, 876.89, 876.89, 876.89, 876.89, 876.89, 877.67, 877.67, 877.67, 877.67, 877.67, 875.82, 875.82, 875.82, 875.82, 875.82, 881.89, 881.89, 881.89, 881.89, 881.89, 874.06, 874.06, 874.06, 874.06, 874.06, 875.61, 875.61, 875.61, 875.61, 875.61, 873.74, 873.74, 873.74, 873.74, 873.74, 872.62, 872.62, 872.62, 872.62, 872.62, 874.83, 874.83, 874.83, 874.83, 874.83, 875.54, 875.54, 875.54, 875.54, 875.54, 873.96, 873.96, 873.96, 873.96, 873.96, 872.07, 872.07, 872.07, 872.07, 872.07, 867.86, 867.86, 867.86, 867.86, 867.86, 870.33, 870.33, 870.33, 870.33, 870.33, 873.14, 873.14, 873.14, 873.14, 873.14, 871.97, 871.97, 871.97, 871.97, 871.97, 876.31, 876.31, 876.31, 876.31, 876.31, 876.83, 876.83, 876.83, 876.83, 876.83, 876.0, 876.0, 876.0, 876.0, 876.0, 877.0, 877.0, 877.0, 877.0, 877.0, 877.69, 877.69, 877.69, 877.69]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715918485 --> 1715919111
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 37.89, 37.89, 37.89, 37.89, 37.89, 28.41, 28.41, 28.41, 28.41, 28.41, 27.67, 27.67, 27.67, 27.67, 27.67, 30.77, 30.77, 30.77, 30.77, 30.77, 32.12, 32.12, 32.12, 32.12, 32.12, 33.82, 33.82, 33.82, 33.82, 33.82, 34.95, 34.95, 34.95, 34.95, 34.95, 35.55, 35.55, 35.55, 35.55, 35.55, 35.25, 35.25, 35.25, 35.25, 35.25, 34.69, 34.69, 34.69, 34.69, 34.69, 34.72, 34.72, 34.72, 34.72, 34.72, 34.48, 34.48, 34.48, 34.48, 34.48, 32.63, 32.63, 32.63, 32.63, 32.63, 31.84, 31.84, 31.84, 31.84, 31.84, 31.56, 31.56, 31.56, 31.56, 31.56, 30.36, 30.36, 30.36, 30.36, 30.36, 30.07, 30.07, 30.07, 30.07, 30.07, 30.24, 30.24, 30.24, 30.24, 30.24, 29.95, 29.95, 29.95, 29.95, 29.95, 29.62, 29.62, 29.62, 29.62, 29.62, 29.48, 29.48, 29.48, 29.48, 29.48, 29.48, 29.48, 29.48, 29.48, 29.48, 29.7, 29.7, 29.7, 29.7, 29.7, 29.82, 29.82, 29.82, 29.82, 29.82, 29.82, 29.82, 29.82, 29.82, 29.82, 29.79, 29.79, 29.79, 29.79, 29.79, 29.99, 29.99, 29.99, 29.99, 29.99, 30.04, 30.04, 30.04, 30.04, 30.04, 29.83, 29.83, 29.83, 29.83, 29.83, 30.04, 30.04, 30.04, 30.04, 30.04, 30.34, 30.34, 30.34, 30.34, 30.34, 30.48, 30.48, 30.48, 30.48, 30.48, 30.59, 30.59, 30.59, 30.59, 30.59, 30.74, 30.74, 30.74, 30.74, 30.74, 30.82, 30.82, 30.82, 30.82, 30.82, 30.72, 30.72, 30.72, 30.72, 30.72, 30.5, 30.5, 30.5, 30.5, 30.5, 30.27, 30.27, 30.27, 30.27, 30.27, 29.86, 29.86, 29.86, 29.86, 29.86, 29.79, 29.79, 29.79, 29.79, 29.79, 29.97, 29.97, 29.97, 29.97, 29.97, 30.05, 30.05, 30.05, 30.05, 30.05, 30.12, 30.12, 30.12, 30.12, 30.12, 30.2, 30.2, 30.2, 30.2, 30.2, 30.13, 30.13, 30.13, 30.13, 30.13, 29.99, 29.99, 29.99, 29.99, 29.99, 29.76, 29.76, 29.76, 29.76, 29.76, 28.64, 28.64, 28.64, 28.64, 28.64, 28.64, 28.64, 28.64, 28.64, 28.64, 28.57, 28.57, 28.57, 28.57, 28.57, 28.47, 28.47, 28.47, 28.47, 28.47, 28.53, 28.53, 28.53, 28.53, 28.53, 28.51, 28.51, 28.51, 28.51, 28.51, 28.64, 28.64, 28.64, 28.64, 28.64, 28.72, 28.72, 28.72, 28.72, 28.72, 28.75, 28.75, 28.75, 28.75, 28.75, 28.75, 28.75, 28.75, 28.75, 28.75, 28.85, 28.85, 28.85, 28.85, 28.85, 29.01, 29.01, 29.01, 29.01, 29.01, 29.11, 29.11, 29.11, 29.11, 29.11, 29.12, 29.12, 29.12, 29.12]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715918485 --> 1715919111
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.36, 0.36, 0.36, 0.36, 0.36, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.22, 0.22, 0.22, 0.22, 0.22, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.4, 0.4, 0.4, 0.4, 0.31, 0.31, 0.31, 0.31, 0.31, 0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.34, 0.34, 0.34, 0.34, 0.34, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.3, 0.3, 0.3, 0.3, 0.3, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.27, 0.27, 0.27, 0.27, 0.27, 0.39, 0.39, 0.39, 0.39, 0.39, 0.23, 0.23, 0.23, 0.23, 0.23, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.47, 0.47, 0.47, 0.47, 0.47, 0.53, 0.53, 0.53, 0.53, 0.53, 0.41, 0.41, 0.41, 0.41, 0.41, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.25, 0.25, 0.25, 0.25]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715918485 --> 1715919111
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0]

convert-hf-to-gguf.py

huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

convert-hf-to-gguf.py

Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

convert-hf-to-gguf.py

Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

mnlife force-pushed the chatglm3 branch from bb6bc4a to 9a8db6b Compare April 30, 2024 07:10

mofosyne added help wanted Extra attention is needed enhancement New feature or request review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024

compilade reviewed May 9, 2024

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

mofosyne self-assigned this May 10, 2024

mofosyne force-pushed the chatglm3 branch from c3804d0 to 95dabb1 Compare May 10, 2024 12:31

mofosyne removed their assignment May 10, 2024

add chatglm3-6b model support

398fecb

huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

mnlife force-pushed the chatglm3 branch from 95dabb1 to 398fecb Compare May 15, 2024 02:45

mofosyne marked this pull request as ready for review May 15, 2024 03:12

compilade reviewed May 15, 2024

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

mnlife force-pushed the chatglm3 branch from f40ad2f to 8aee20e Compare May 15, 2024 05:24

remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model

cb324f4

Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

mnlife force-pushed the chatglm3 branch from 8aee20e to cb324f4 Compare May 15, 2024 05:28

fix lint error

83b313a

Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

compilade reviewed May 16, 2024

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

optimize convert-hf-to-gguf.py for chatglm model

ed1d3ff

Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>

mnlife force-pushed the chatglm3 branch from 10a17fa to ed1d3ff Compare May 17, 2024 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add chatglm3-6b model support [help wanted] #6999

add chatglm3-6b model support [help wanted] #6999

mnlife commented Apr 30, 2024 •

edited

github-actions bot commented Apr 30, 2024 •

edited

add chatglm3-6b model support [help wanted] #6999

Are you sure you want to change the base?

add chatglm3-6b model support [help wanted] #6999

Conversation

mnlife commented Apr 30, 2024 • edited

github-actions bot commented Apr 30, 2024 • edited

mnlife commented Apr 30, 2024 •

edited

github-actions bot commented Apr 30, 2024 •

edited