You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the sample wav file with a grammar gives no change in the output compared to without the grammar. I'm purposefully not giving any prompt because I want to see how it works without the help of a prompt.
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_init_state: kv self size = 8.26 MB
whisper_init_state: kv cross size = 9.22 MB
whisper_init_state: compute buffer (conv) = 13.32 MB
whisper_init_state: compute buffer (encode) = 85.66 MB
whisper_init_state: compute buffer (cross) = 4.01 MB
whisper_init_state: compute buffer (decode) = 96.02 MB
main: grammar:
root ::= init color [.]
init ::= [ ] [r] [e] [d] [,] [ ] [g] [r] [e] [e] [n] [,] [ ] [b] [l] [u] [e]
color ::= [,] [ ] color_4
prompt ::= init [.]
color_4 ::= [r] [e] [d] | [g] [r] [e] [e] [n] | [b] [l] [u] [e]
system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 8 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:07.960] And so my fellow Americans ask not what your country can do for you
[00:00:07.960 --> 00:00:10.760] ask what you can do for your country.
whisper_print_timings: load time = 288.52 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 12.49 ms
whisper_print_timings: sample time = 44.04 ms / 139 runs ( 0.32 ms per run)
whisper_print_timings: encode time = 466.53 ms / 1 runs ( 466.53 ms per run)
whisper_print_timings: decode time = 10.08 ms / 2 runs ( 5.04 ms per run)
whisper_print_timings: batchd time = 128.82 ms / 133 runs ( 0.97 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 962.72 ms
It doesn't seem to make any difference if I increase the grammar-penalty or use a different grammar. It doesn't even work when I run it with audio that is in the grammar.
Running the sample wav file with a grammar gives no change in the output compared to without the grammar. I'm purposefully not giving any prompt because I want to see how it works without the help of a prompt.
Command:
./main -f samples/jfk.wav -m models/ggml-tiny.en.bin -t 8 --grammar ./grammars/chess.gbnf --grammar-penalty 100
Output:
It doesn't seem to make any difference if I increase the grammar-penalty or use a different grammar. It doesn't even work when I run it with audio that is in the grammar.
Command:
./main -f knight.wav -m models/ggml-tiny.en.bin -t 8 --grammar ./grammars/chess.gbnf --grammar-penalty 100
Output:
It should give
knight to e5
instead ofnight to E5
.The text was updated successfully, but these errors were encountered: