Bug fix for server crash if first token is the stop word and asking for logprobs #7038

maor-ps · 2024-05-02T07:56:08Z

If the stop token is the first suggested token from the model and we ask for logprobs the server will crash when generating the logprobs vector

Currently this fix makes the allocation of the vector to be safer and wont crash the server.
it returns an empty content but also does not return any logprobs...

Toy example to reproduce on LLama2-13b

{'prompt': 'Q: hello world \nA: ',
'stop': ['\n'],
'temperature': 0.0,
'n_predict': 10,
'cache_prompt': True,
'n_probs': 10}

…will crash This will reproduce the issue in llama13b { 'prompt': 'Q: hello world \nA: ', 'stop': ['\n'], 'temperature': 0.0, 'n_predict': 10, 'cache_prompt': True, 'n_probs': 10 }

github-actions · 2024-05-02T08:24:57Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 549 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8545.68ms p(95)=20299.56ms fails=, finish reason: stop=487 truncated=62
Prompt processing (pp): avg=102.25tk/s p(95)=428.51tk/s
Token generation (tg): avg=34.57tk/s p(95)=47.19tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=patch-2 commit=534db8eb3e6c95712193809071b8a4036c2f2a07

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714637660 --> 1714638290
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 753.09, 753.09, 753.09, 753.09, 753.09, 739.35, 739.35, 739.35, 739.35, 739.35, 841.14, 841.14, 841.14, 841.14, 841.14, 840.75, 840.75, 840.75, 840.75, 840.75, 867.74, 867.74, 867.74, 867.74, 867.74, 882.2, 882.2, 882.2, 882.2, 882.2, 885.86, 885.86, 885.86, 885.86, 885.86, 885.61, 885.61, 885.61, 885.61, 885.61, 905.58, 905.58, 905.58, 905.58, 905.58, 900.88, 900.88, 900.88, 900.88, 900.88, 907.83, 907.83, 907.83, 907.83, 907.83, 929.03, 929.03, 929.03, 929.03, 929.03, 920.33, 920.33, 920.33, 920.33, 920.33, 872.56, 872.56, 872.56, 872.56, 872.56, 856.73, 856.73, 856.73, 856.73, 856.73, 849.61, 849.61, 849.61, 849.61, 849.61, 862.35, 862.35, 862.35, 862.35, 862.35, 860.21, 860.21, 860.21, 860.21, 860.21, 877.2, 877.2, 877.2, 877.2, 877.2, 874.54, 874.54, 874.54, 874.54, 874.54, 873.1, 873.1, 873.1, 873.1, 873.1, 878.49, 878.49, 878.49, 878.49, 878.49, 880.67, 880.67, 880.67, 880.67, 880.67, 867.29, 867.29, 867.29, 867.29, 867.29, 866.51, 866.51, 866.51, 866.51, 866.51, 867.02, 867.02, 867.02, 867.02, 867.02, 881.91, 881.91, 881.91, 881.91, 881.91, 878.75, 878.75, 878.75, 878.75, 878.75, 876.72, 876.72, 876.72, 876.72, 876.72, 876.53, 876.53, 876.53, 876.53, 876.53, 881.57, 881.57, 881.57, 881.57, 881.57, 880.55, 880.55, 880.55, 880.55, 880.55, 882.62, 882.62, 882.62, 882.62, 882.62, 888.78, 888.78, 888.78, 888.78, 888.78, 902.46, 902.46, 902.46, 902.46, 902.46, 906.9, 906.9, 906.9, 906.9, 906.9, 906.45, 906.45, 906.45, 906.45, 906.45, 904.1, 904.1, 904.1, 904.1, 904.1, 902.32, 902.32, 902.32, 902.32, 902.32, 900.94, 900.94, 900.94, 900.94, 900.94, 899.3, 899.3, 899.3, 899.3, 899.3, 905.08, 905.08, 905.08, 905.08, 905.08, 847.25, 847.25, 847.25, 847.25, 847.25, 841.27, 841.27, 841.27, 841.27, 841.27, 839.36, 839.36, 839.36, 839.36, 839.36, 837.79, 837.79, 837.79, 837.79, 837.79, 842.2, 842.2, 842.2, 842.2, 842.2, 844.1, 844.1, 844.1, 844.1, 844.1, 843.5, 843.5, 843.5, 843.5, 843.5, 843.41, 843.41, 843.41, 843.41, 843.41, 844.13, 844.13, 844.13, 844.13, 844.13, 846.81, 846.81, 846.81, 846.81, 846.81, 845.42, 845.42, 845.42, 845.42, 845.42, 846.97, 846.97, 846.97, 846.97, 846.97, 831.67, 831.67, 831.67, 831.67, 831.67, 832.85, 832.85, 832.85, 832.85, 832.85, 832.95, 832.95, 832.95, 832.95, 832.95, 834.27, 834.27, 834.27, 834.27, 834.27, 835.99, 835.99, 835.99, 835.99, 835.99, 836.61, 836.61, 836.61, 836.61, 836.61, 838.75, 838.75, 838.75, 838.75, 838.75, 838.75]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714637660 --> 1714638290
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 35.37, 35.37, 35.37, 35.37, 35.37, 29.35, 29.35, 29.35, 29.35, 29.35, 29.34, 29.34, 29.34, 29.34, 29.34, 28.93, 28.93, 28.93, 28.93, 28.93, 29.98, 29.98, 29.98, 29.98, 29.98, 30.17, 30.17, 30.17, 30.17, 30.17, 32.28, 32.28, 32.28, 32.28, 32.28, 33.09, 33.09, 33.09, 33.09, 33.09, 33.14, 33.14, 33.14, 33.14, 33.14, 32.92, 32.92, 32.92, 32.92, 32.92, 33.13, 33.13, 33.13, 33.13, 33.13, 33.04, 33.04, 33.04, 33.04, 33.04, 32.67, 32.67, 32.67, 32.67, 32.67, 32.1, 32.1, 32.1, 32.1, 32.1, 31.8, 31.8, 31.8, 31.8, 31.8, 31.82, 31.82, 31.82, 31.82, 31.82, 32.13, 32.13, 32.13, 32.13, 32.13, 31.92, 31.92, 31.92, 31.92, 31.92, 31.83, 31.83, 31.83, 31.83, 31.83, 31.62, 31.62, 31.62, 31.62, 31.62, 31.4, 31.4, 31.4, 31.4, 31.4, 31.62, 31.62, 31.62, 31.62, 31.62, 31.7, 31.7, 31.7, 31.7, 31.7, 31.79, 31.79, 31.79, 31.79, 31.79, 31.91, 31.91, 31.91, 31.91, 31.91, 32.07, 32.07, 32.07, 32.07, 32.07, 31.54, 31.54, 31.54, 31.54, 31.54, 31.41, 31.41, 31.41, 31.41, 31.41, 31.63, 31.63, 31.63, 31.63, 31.63, 31.81, 31.81, 31.81, 31.81, 31.81, 31.85, 31.85, 31.85, 31.85, 31.85, 32.06, 32.06, 32.06, 32.06, 32.06, 32.11, 32.11, 32.11, 32.11, 32.11, 31.93, 31.93, 31.93, 31.93, 31.93, 31.76, 31.76, 31.76, 31.76, 31.76, 31.35, 31.35, 31.35, 31.35, 31.35, 31.23, 31.23, 31.23, 31.23, 31.23, 31.24, 31.24, 31.24, 31.24, 31.24, 31.39, 31.39, 31.39, 31.39, 31.39, 31.44, 31.44, 31.44, 31.44, 31.44, 31.57, 31.57, 31.57, 31.57, 31.57, 31.19, 31.19, 31.19, 31.19, 31.19, 31.18, 31.18, 31.18, 31.18, 31.18, 30.74, 30.74, 30.74, 30.74, 30.74, 29.86, 29.86, 29.86, 29.86, 29.86, 29.68, 29.68, 29.68, 29.68, 29.68, 29.63, 29.63, 29.63, 29.63, 29.63, 29.78, 29.78, 29.78, 29.78, 29.78, 29.8, 29.8, 29.8, 29.8, 29.8, 29.97, 29.97, 29.97, 29.97, 29.97, 30.03, 30.03, 30.03, 30.03, 30.03, 30.0, 30.0, 30.0, 30.0, 30.0, 29.93, 29.93, 29.93, 29.93, 29.93, 29.85, 29.85, 29.85, 29.85, 29.85, 30.02, 30.02, 30.02, 30.02, 30.02, 30.13, 30.13, 30.13, 30.13, 30.13, 30.27, 30.27, 30.27, 30.27, 30.27, 30.36, 30.36, 30.36, 30.36, 30.36, 30.38, 30.38, 30.38, 30.38, 30.38, 30.4, 30.4, 30.4, 30.4, 30.4, 30.42]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714637660 --> 1714638290
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.01, 0.01, 0.01, 0.01, 0.01, 0.35, 0.35, 0.35, 0.35, 0.35, 0.37, 0.37, 0.37, 0.37, 0.37, 0.29, 0.29, 0.29, 0.29, 0.29, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.3, 0.3, 0.3, 0.3, 0.3, 0.22, 0.22, 0.22, 0.22, 0.22, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.28, 0.28, 0.28, 0.28, 0.28, 0.3, 0.3, 0.3, 0.3, 0.3, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.29, 0.29, 0.29, 0.29, 0.29, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.39, 0.39, 0.39, 0.39, 0.39, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.2, 0.2, 0.2, 0.2, 0.2, 0.51, 0.51, 0.51, 0.51, 0.51, 0.6, 0.6, 0.6, 0.6, 0.6, 0.44, 0.44, 0.44, 0.44, 0.44, 0.33, 0.33, 0.33, 0.33, 0.33, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.28, 0.28, 0.28, 0.28, 0.28, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.08, 0.08, 0.08, 0.08, 0.08, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.23]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714637660 --> 1714638290
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0]

ngxson

LGTM. Thanks!

This seems to be an edge case where stop_word_toks.size() > generated_token_probs.size(), that causes slot.generated_token_probs.end() - stop_word_toks.size() to be negative and crash the server.

…will crash (ggerganov#7038) This will reproduce the issue in llama13b { 'prompt': 'Q: hello world \nA: ', 'stop': ['\n'], 'temperature': 0.0, 'n_predict': 10, 'cache_prompt': True, 'n_probs': 10 }

If first token generated from the server is the stop word the server …

534db8e

…will crash This will reproduce the issue in llama13b { 'prompt': 'Q: hello world \nA: ', 'stop': ['\n'], 'temperature': 0.0, 'n_predict': 10, 'cache_prompt': True, 'n_probs': 10 }

maor-ps changed the title ~~Bug fix for server crash if first token is the stop word~~ Bug fix for server crash if first token is the stop word and asking for logprobs May 2, 2024

ngxson self-requested a review May 4, 2024 09:06

ngxson approved these changes May 4, 2024

View reviewed changes

ngxson merged commit 03fb8a0 into ggerganov:master May 4, 2024
64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix for server crash if first token is the stop word and asking for logprobs #7038

Bug fix for server crash if first token is the stop word and asking for logprobs #7038

maor-ps commented May 2, 2024

github-actions bot commented May 2, 2024

ngxson left a comment

Bug fix for server crash if first token is the stop word and asking for logprobs #7038

Bug fix for server crash if first token is the stop word and asking for logprobs #7038

Conversation

maor-ps commented May 2, 2024

github-actions bot commented May 2, 2024

ngxson left a comment

Choose a reason for hiding this comment