-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: add override-kv, or at least a way to specify the pretokenizer #819
Comments
The CLI args for that flag are horrendous. I'm sure there must be a better way to do it. |
Well, the override-kv is more of a low-level tool, but it can be very useful. For the concrete problem of setting the pre-tokenizer type, a simple "--pretokenizer llama3" or so would suffice. less generic, much less horrendous, I would assume. The advantage of override-kv, other than being generic, would be compatibility with llama.cpp, as I see this a lot. But having any way to override the pretokenizer with koboldcpp would greatly help. I don't see hundreds or even thousands of models to be requantized anytime soon, and this affects all kinds of models, not just llama3 models. (deepseek, command-r, practically anything that is not llama 2). It would be pretty much in the same vein as --contextsize or --ropeconfig, which also override model-provided kv values. |
I can't use koboldcpp anymore for command-r plus because nobody wants to requantize it instead of using -override kv in llamacpp. |
There's a fix for command-r plus coming out in the next version. In the meantime, you can try using an older version first. |
Cool, thank you! |
llama.cpp has an override-kv option that can be used to override, well, model kv values. This can be useful with the myriad of existing ggufs that don't have a pretokenizer specified. It would be nice if koboldcpp had such an option (either generic override-kv option, or a way to specify/override the pretokenizer string). Or both. override-kv can be useful in a variety of other ways, but is of course more effort than just being able to specify the pretokenizer type.
Having a llama.cpp-compatible syntax for override-kv would also be a plus for users who could use instructions written for llama.cpp.
The text was updated successfully, but these errors were encountered: