feat: ggml: support more parameters from llama.cpp #3314

dm4 · 2024-04-02T02:00:08Z

Summary

We currently support some parameters from llama.cpp, such as n_gpu_layers, cox-size, thread, etc., and we expect to support even more parameters.

Details

Refer to llama.cpp/common/common.cpp/gpt_params_find_arg(), planning to support additional parameters.

Appendix

List all options:

The text was updated successfully, but these errors were encountered:

jaydee029 · 2024-04-06T03:37:54Z

is this issue open for contributions? if yes I would love to look into this.

dm4 · 2024-04-06T05:31:33Z

is this issue open for contributions? if yes I would love to look into this.

Yes, this issue is open for contributions. We welcome your input and any code related to this issue.

Fusaaaann · 2024-05-11T10:58:46Z

some parameters, such as --parallel and --draft, are not directly used in internal implementation of llama.cpp, according to search result for "n_parallel" in llama.cpp.
only some parameters would affect internal behavior of llama.cpp functions, like parameters related to RoPE, otherwise integrating processing logics to support the additional parameters could totally change implementation of compute(), like the example below:

Abstract of integrating `--parallel` `--draft` and parsing it as an optional parameter in WasmEdge

struct Graph {
    // ...
    uint64_t NParallel = 1; 
    uint64_t NDraft = 1;
}

Expect<ErrNo> compute(WasiNNEnvironment &Env, uint32_t ContextId) noexcept {
    // ...
    // if --draft and --parallel are set
    ReturnCode = SpeculativeDecoding(GraphRef, CxtRef);
    // else use current implementation
    // ...
}

ErrNo SpeculativeDecoding(Graph &GraphRef, Context &CxtRef) noexcept {
    // implementation like https://github.com/ggerganov/llama.cpp/blob/3292733f95d4632a956890a438af5192e7031c12/examples/speculative/speculative.cpp
}

detailed code: https://github.com/Fusaaaann/WasmEdge/blob/ae718df452658df555e2b4fe35e8c90e69c5c55f/plugins/wasi_nn/strategies/strategies.cpp#L234

what is WasmEdge's future planning for supporting these parameters, if wasi-nn functions could become too complex to fit in one ggml.cpp file due to support for these parameters?

hydai · 2024-05-14T09:39:47Z

Hi @Fusaaaann
We don't have a robust timeline for supporting the above parameters. If there is an application that will require such options, then we will increase the priority of them. There are already two different ways to handle normal LLM and LLaVA applications in our plugin; we don't matter if the complexity increases after adding more parameters.

dm4 added the enhancement New feature or request label Apr 2, 2024

dm4 mentioned this issue Apr 2, 2024

[WASI-NN] ggml: support grammar #3318

Merged

This was referenced May 31, 2024

enables usage of parameter --no-mmap from llama.cpp #3436

Merged

added ggml parameter --no-mmap LlamaEdge/LlamaEdge#176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ggml: support more parameters from llama.cpp #3314

feat: ggml: support more parameters from llama.cpp #3314

dm4 commented Apr 2, 2024 •

edited

jaydee029 commented Apr 6, 2024

dm4 commented Apr 6, 2024

Fusaaaann commented May 11, 2024 •

edited

hydai commented May 14, 2024

feat: ggml: support more parameters from llama.cpp #3314

feat: ggml: support more parameters from llama.cpp #3314

Comments

dm4 commented Apr 2, 2024 • edited

Summary

Details

Appendix

jaydee029 commented Apr 6, 2024

dm4 commented Apr 6, 2024

Fusaaaann commented May 11, 2024 • edited

hydai commented May 14, 2024

dm4 commented Apr 2, 2024 •

edited

Fusaaaann commented May 11, 2024 •

edited