Add BPE pre-tokenization for Command-R/R+. #7063

dranger003 · 2024-05-03T19:49:06Z

This replaces PR #7033 as a result of merging PR #6511.

github-actions · 2024-05-03T20:27:58Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 536 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8756.89ms p(95)=21734.99ms fails=, finish reason: stop=469 truncated=67
Prompt processing (pp): avg=103.54tk/s p(95)=469.12tk/s
Token generation (tg): avg=32.32tk/s p(95)=46.55tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=bpe-pretok-command-r-2 commit=f5806b2d09ba2dcf60d8d66046ed5853234f28de

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 675.74, 675.74, 675.74, 675.74, 675.74, 513.3, 513.3, 513.3, 513.3, 513.3, 531.59, 531.59, 531.59, 531.59, 531.59, 577.54, 577.54, 577.54, 577.54, 577.54, 621.43, 621.43, 621.43, 621.43, 621.43, 644.79, 644.79, 644.79, 644.79, 644.79, 647.69, 647.69, 647.69, 647.69, 647.69, 688.4, 688.4, 688.4, 688.4, 688.4, 691.16, 691.16, 691.16, 691.16, 691.16, 709.44, 709.44, 709.44, 709.44, 709.44, 731.08, 731.08, 731.08, 731.08, 731.08, 743.11, 743.11, 743.11, 743.11, 743.11, 724.87, 724.87, 724.87, 724.87, 724.87, 770.53, 770.53, 770.53, 770.53, 770.53, 794.5, 794.5, 794.5, 794.5, 794.5, 789.99, 789.99, 789.99, 789.99, 789.99, 791.02, 791.02, 791.02, 791.02, 791.02, 816.51, 816.51, 816.51, 816.51, 816.51, 813.72, 813.72, 813.72, 813.72, 813.72, 816.15, 816.15, 816.15, 816.15, 816.15, 822.75, 822.75, 822.75, 822.75, 822.75, 826.64, 826.64, 826.64, 826.64, 826.64, 832.54, 832.54, 832.54, 832.54, 832.54, 817.58, 817.58, 817.58, 817.58, 817.58, 820.87, 820.87, 820.87, 820.87, 820.87, 822.5, 822.5, 822.5, 822.5, 822.5, 837.84, 837.84, 837.84, 837.84, 837.84, 835.14, 835.14, 835.14, 835.14, 835.14, 834.02, 834.02, 834.02, 834.02, 834.02, 835.37, 835.37, 835.37, 835.37, 835.37, 840.63, 840.63, 840.63, 840.63, 840.63, 840.19, 840.19, 840.19, 840.19, 840.19, 840.33, 840.33, 840.33, 840.33, 840.33, 842.91, 842.91, 842.91, 842.91, 842.91, 845.73, 845.73, 845.73, 845.73, 845.73, 850.35, 850.35, 850.35, 850.35, 850.35, 861.33, 861.33, 861.33, 861.33, 861.33, 860.72, 860.72, 860.72, 860.72, 860.72, 858.59, 858.59, 858.59, 858.59, 858.59, 861.43, 861.43, 861.43, 861.43, 861.43, 864.45, 864.45, 864.45, 864.45, 864.45, 876.09, 876.09, 876.09, 876.09, 876.09, 860.41, 860.41, 860.41, 860.41, 860.41, 836.69, 836.69, 836.69, 836.69, 836.69, 836.74, 836.74, 836.74, 836.74, 836.74, 834.68, 834.68, 834.68, 834.68, 834.68, 832.0, 832.0, 832.0, 832.0, 832.0, 835.99, 835.99, 835.99, 835.99, 835.99, 838.72, 838.72, 838.72, 838.72, 838.72, 839.72, 839.72, 839.72, 839.72, 839.72, 842.33, 842.33, 842.33, 842.33, 842.33, 844.19, 844.19, 844.19, 844.19, 844.19, 847.05, 847.05, 847.05, 847.05, 847.05, 847.78, 847.78, 847.78, 847.78, 847.78, 849.13, 849.13, 849.13, 849.13, 849.13, 853.68, 853.68, 853.68, 853.68, 853.68, 854.56, 854.56, 854.56, 854.56, 854.56, 854.47, 854.47, 854.47, 854.47, 854.47, 855.58, 855.58, 855.58, 855.58, 855.58, 856.38, 856.38, 856.38, 856.38, 856.38, 856.17, 856.17, 856.17, 856.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.8, 43.8, 43.8, 43.8, 43.8, 40.63, 40.63, 40.63, 40.63, 40.63, 34.12, 34.12, 34.12, 34.12, 34.12, 33.15, 33.15, 33.15, 33.15, 33.15, 32.78, 32.78, 32.78, 32.78, 32.78, 32.89, 32.89, 32.89, 32.89, 32.89, 33.74, 33.74, 33.74, 33.74, 33.74, 34.58, 34.58, 34.58, 34.58, 34.58, 34.84, 34.84, 34.84, 34.84, 34.84, 34.7, 34.7, 34.7, 34.7, 34.7, 34.54, 34.54, 34.54, 34.54, 34.54, 34.41, 34.41, 34.41, 34.41, 34.41, 33.57, 33.57, 33.57, 33.57, 33.57, 33.47, 33.47, 33.47, 33.47, 33.47, 32.14, 32.14, 32.14, 32.14, 32.14, 31.47, 31.47, 31.47, 31.47, 31.47, 31.87, 31.87, 31.87, 31.87, 31.87, 31.98, 31.98, 31.98, 31.98, 31.98, 31.28, 31.28, 31.28, 31.28, 31.28, 30.99, 30.99, 30.99, 30.99, 30.99, 30.96, 30.96, 30.96, 30.96, 30.96, 31.09, 31.09, 31.09, 31.09, 31.09, 31.31, 31.31, 31.31, 31.31, 31.31, 31.17, 31.17, 31.17, 31.17, 31.17, 31.2, 31.2, 31.2, 31.2, 31.2, 31.38, 31.38, 31.38, 31.38, 31.38, 31.37, 31.37, 31.37, 31.37, 31.37, 30.79, 30.79, 30.79, 30.79, 30.79, 30.52, 30.52, 30.52, 30.52, 30.52, 30.71, 30.71, 30.71, 30.71, 30.71, 30.86, 30.86, 30.86, 30.86, 30.86, 31.05, 31.05, 31.05, 31.05, 31.05, 31.27, 31.27, 31.27, 31.27, 31.27, 31.31, 31.31, 31.31, 31.31, 31.31, 31.26, 31.26, 31.26, 31.26, 31.26, 31.19, 31.19, 31.19, 31.19, 31.19, 31.09, 31.09, 31.09, 31.09, 31.09, 30.87, 30.87, 30.87, 30.87, 30.87, 30.88, 30.88, 30.88, 30.88, 30.88, 31.08, 31.08, 31.08, 31.08, 31.08, 31.22, 31.22, 31.22, 31.22, 31.22, 31.26, 31.26, 31.26, 31.26, 31.26, 31.23, 31.23, 31.23, 31.23, 31.23, 31.14, 31.14, 31.14, 31.14, 31.14, 30.9, 30.9, 30.9, 30.9, 30.9, 29.61, 29.61, 29.61, 29.61, 29.61, 29.6, 29.6, 29.6, 29.6, 29.6, 29.56, 29.56, 29.56, 29.56, 29.56, 29.55, 29.55, 29.55, 29.55, 29.55, 29.69, 29.69, 29.69, 29.69, 29.69, 29.7, 29.7, 29.7, 29.7, 29.7, 29.89, 29.89, 29.89, 29.89, 29.89, 29.88, 29.88, 29.88, 29.88, 29.88, 29.87, 29.87, 29.87, 29.87, 29.87, 29.69, 29.69, 29.69, 29.69, 29.69, 29.62, 29.62, 29.62, 29.62, 29.62, 29.67, 29.67, 29.67, 29.67, 29.67, 29.81, 29.81, 29.81, 29.81, 29.81, 29.92, 29.92, 29.92, 29.92, 29.92, 30.03, 30.03, 30.03, 30.03, 30.03, 30.07, 30.07, 30.07, 30.07]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18, 0.18, 0.18, 0.18, 0.18, 0.39, 0.39, 0.39, 0.39, 0.39, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.07, 0.07, 0.07, 0.07, 0.07, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.32, 0.32, 0.32, 0.32, 0.32, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.33, 0.33, 0.33, 0.33, 0.33, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.09, 0.09, 0.09, 0.09, 0.09, 0.33, 0.33, 0.33, 0.33, 0.33, 0.45, 0.45, 0.45, 0.45, 0.45, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.46, 0.46, 0.46, 0.46, 0.46, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.3, 0.3, 0.3, 0.3, 0.3, 0.27, 0.27, 0.27, 0.27, 0.27, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0]

slaren · 2024-05-03T21:03:56Z

This and #7041 have different regex. Which one is correct?

dranger003 · 2024-05-03T23:05:28Z

also has 'Digits' and individual_digits=True, so making an assumption there now.

@slaren There is mention of an assumption about digits, which I haven't included but I can include if needed. The regex in this PR has been tested with test-tokenizer-0 which I presume does not cover all scenarios?

araleza · 2024-05-03T23:43:13Z

Hi, does this mean that Command-R was always running at reduced quality, and we just didn't know until recently? Or have the recent Llama 3 changes to the llama.cpp tokenizer resulted in this update being needed to get it back to where it was before the Llama 3 changes went in?

eskeletor97 · 2024-05-04T07:41:08Z

There is mention of an assumption about digits, which I haven't included but I can include if needed. The regex in this PR has been tested with test-tokenizer-0 which I presume does not cover all scenarios?

I haven't really tested command-r before with any math or numbers, but isn't it a similar issue to llama3 where digits were grouped and tokenized incorrectly?

ggerganov · 2024-05-04T08:41:12Z

I had to update to new transformers:

diff --git a/requirements/requirements-convert.txt b/requirements/requirements-convert.txt
index a3d6ecec..5520ba73 100644
--- a/requirements/requirements-convert.txt
+++ b/requirements/requirements-convert.txt
@@ -1,5 +1,5 @@
 numpy~=1.24.4
 sentencepiece~=0.1.98
-transformers>=4.35.2,<5.0.0
+transformers>=4.40.1,<5.0.0
 gguf>=0.1.0
 protobuf>=4.21.0,<5.0.0

Else, I got this error:

python3 convert-hf-to-gguf-update.py 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
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/Users/ggerganov/development/github/llama.cpp/convert-hf-to-gguf-update.py", line 135, in <module>
    tokenizer = AutoTokenizer.from_pretrained(f"models/tokenizers/{name}")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 784, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class CohereTokenizer does not exist or is not currently imported.

ggerganov · 2024-05-04T08:42:37Z

Let's rebase on latest master and I will run some extra tests to check if the regexes are correct

dranger003 · 2024-05-04T10:43:54Z

@ggerganov Thanks, the PR has been rebased and I added the transformers change.

* Add BPE pre-tokenization for Command-R/R+. * Bump transformers convert requirement. * command-r : add individual digits regex --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

dranger003 mentioned this pull request May 3, 2024

Add BPE pre-tokenization for Command-R. #7033

Closed

LostRuins mentioned this pull request May 4, 2024

BPE pretokenizer - add support for command-r-plus and command-r models #7041

Closed

Add BPE pre-tokenization for Command-R/R+.

d5d6731

dranger003 force-pushed the bpe-pretok-command-r-2 branch from 7bfc01b to d5d6731 Compare May 4, 2024 10:42

Bump transformers convert requirement.

20157cf

command-r : add individual digits regex

f5806b2

ggerganov merged commit 889bdd7 into ggerganov:master May 5, 2024
63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BPE pre-tokenization for Command-R/R+. #7063

Add BPE pre-tokenization for Command-R/R+. #7063

dranger003 commented May 3, 2024

github-actions bot commented May 3, 2024 •

edited

slaren commented May 3, 2024

dranger003 commented May 3, 2024

araleza commented May 3, 2024

eskeletor97 commented May 4, 2024

ggerganov commented May 4, 2024

ggerganov commented May 4, 2024

dranger003 commented May 4, 2024

Add BPE pre-tokenization for Command-R/R+. #7063

Add BPE pre-tokenization for Command-R/R+. #7063

Conversation

dranger003 commented May 3, 2024

github-actions bot commented May 3, 2024 • edited

slaren commented May 3, 2024

dranger003 commented May 3, 2024

araleza commented May 3, 2024

eskeletor97 commented May 4, 2024

ggerganov commented May 4, 2024

ggerganov commented May 4, 2024

dranger003 commented May 4, 2024

github-actions bot commented May 3, 2024 •

edited