Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFX Mentorship (Jun-Aug, 2024): Support piper as a new backend of the WASI-NN WasmEdge plugin #3381

Open
hydai opened this issue May 2, 2024 · 20 comments
Labels
enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants

Comments

@hydai
Copy link
Member

hydai commented May 2, 2024

Summary

Motivation

WasmEdge supports PyTorch, TensorFlow Lite, llama.cpp, and more NN backend. Dealing with the text-to-voice is a big thing that we want to achieve. To make it possible, we would like to integrate piper, A fast, local neural text-to-speech system in C++ as a new WASI-NN backend.

Details

  • Expected Outcome: A new plugin provides a piper WASI-NN backend, a test suite for validating the plugin, documents, and examples for explaining how to use the plugin.
  • Recommended Skills: C++, Wasm
  • Mentor(s):

Application link

https://mentorship.lfx.linuxfoundation.org/project/61014739-ac16-4188-bdab-c87c0a502470

Appendix

@hydai hydai added enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants labels May 2, 2024
@hydai hydai changed the title feat: Support piper as a new backend of the WASI-NN WasmEdge plugin LFX Mentorship (Jun-Aug, 2024): Support piper as a new backend of the WASI-NN WasmEdge plugin May 2, 2024
@ToSeven
Copy link

ToSeven commented May 9, 2024

This project looks very interesting. I really want to attand it! I have rich experience in Rust/Wasm programming and deep learning, so I believe that I am good for this task.

@AZM999
Copy link

AZM999 commented May 10, 2024

Hi @hydai ,
I am intrigued by this project, If you can suggest to me where to start and any qualifying task with it would be very helpful.

@hydai
Copy link
Member Author

hydai commented May 10, 2024

If you are interested in this project and would like to apply for it. Please ensure you can build the piper framework and run the sample applications. Since the whole project is to integrate the piper as one of the WASI-NN backend, the most important part is to understand the piper workflow.

@kumarutkarsh1248
Copy link

@hydai

I have built Piper and run several applications on it; everything is working fine so far. Now, I want to begin working on the project and explore more about Piper as per the project's requirements. Has any previous work been done on this project, or do we have to start from scratch? Additionally, if we need to start from scratch, could you please provide some similar references?

@hydai
Copy link
Member Author

hydai commented May 14, 2024

Hi @kumarutkarsh1248

Has any previous work been done on this project

None for the Piper integration. But there are lots of different backends for the WASI-NN plugin. You can see the appendix section.

or do we have to start from scratch? Additionally, if we need to start from scratch, could you please provide some similar references?

Same as the previous problem, start from scratch with the Piper part. There is an existing WASI-NN implementation for other backends.

@Raunak2024
Copy link

@hydai
Hi There! I am interested in the project and I have good experience with C++. But I am unable to understand how to get started with the project as I have don't know anything about WASM and piper. Any resources you could suggest and can you please clarify what do I need to contribute exactly in this project

@angad-singhh
Copy link

angad-singhh commented May 18, 2024

@hydai Hi There! I am interested in the project and I have good experience with C++. But I am unable to understand how to get started with the project as I have don't know anything about WASM and piper. Any resources you could suggest and can you please clarify what do I need to contribute exactly in this project

Hey, @Raunak2024
I'm also working around this same, WasmEdge LFX project, I suggest that you use this piper repo as it has lot of examples/samples, links to get you started, rest for WASM you can use this.

@kumarutkarsh1248
Copy link

kumarutkarsh1248 commented May 18, 2024

@hydai
I am running some simple examples of tutorial
and getting the final output at the end but have no idea why are those initial errors occuring

Screenshot from 2024-05-18 20-32-07

can someone please guide me on this

Also, I am currently trying to understand the WASI-NN backend implementation for PyTorch. Is there any developer documentation or other resources available that can help me understand the implementation?

@angad-singhh
Copy link

@hydai I am running some simple examples of tutorial and getting the final output at the end but have no idea why are those initial errors occuring
can someone please guide me on this

If you are still having this, feel free to reach to me on discord. I had something similar when setting up the RAG server-> Discord ID: angadsinghh

@Raunak2024
Copy link

Raunak2024 commented May 21, 2024

Hey, @Raunak2024 I'm also working around WasmEdge LFX projects, I suggest that you use this piper repo as it has lot of examples/samples, links to get you started, rest for WASM you can use this

Okay! But I was referring to the official site of Wasmedge

@Raunak2024
Copy link

@hydai I am running some simple examples of tutorial and getting the final output at the end but have no idea why are those initial errors occuring

Screenshot from 2024-05-18 20-32-07

can someone please guide me on this

Also, I am currently trying to understand the WASI-NN backend implementation for PyTorch. Is there any developer documentation or other resources available that can help me understand the implementation?

Check out the official repo of WebAssembly once

@PeterD1524
Copy link

@hydai

As far as I know, Piper seems to use ONNX models and onnxruntime. I think instead of making Piper a new WASI-NN backend, we should support ONNX.

There are some relationships between backends and model file formats:

OpenVINO - .bin and .xml

ONNX - .onnx

PyTorch - .pt

TensorflowLite - .tflite

GGML - .gguf

I think Piper is at a higher level. Therefore, in order for Piper to work in WasmEdge, it should have some libraries that use the ONNX backend for synthesis (phoneme ids to audio), and other processing (like phonemization) should run in wasm (non system-level part).

Please let me know what you think.

@PeterD1524
Copy link

Regarding ONNX, WasmEdge seems to support it before, but it was removed for some reasons.

Related links:
#1063
#952
https://github.com/WasmEdge/WasmEdge/tree/proposal/wasi_nn_onnx

Anyway, I've modified the code in the master branch and successfully built WasmEdge with ONNX support using onnxruntime. It can run F32 models successfully.

However, piper uses an I64 tensor type models. WasmEdge currently understands F16, F32, U8 and I32. We will need I64 support for Piper to work.

plugin/wasi_nn/types.h:

enum class TensorType : uint8_t { F16 = 0, F32 = 1, U8 = 2, I32 = 3 };

wasmedge-wasi-nn has more tensor types.

https://github.com/second-state/wasmedge-wasi-nn/blob/ggml/rust/src/tensor.rs:

pub enum TensorType {
    F16 = 0,
    F32,
    F64,
    U8,
    I32,
    I64,
}

The wasi-nn proposal also has BF16.

https://github.com/WebAssembly/wasi-nn/blob/main/wit/wasi-nn.wit:

enum tensor-type {
    FP16,
    FP32,
    FP64,
    BF16,
    U8,
    I32,
    I64
}

The problem here is that they use different values ​​for the tensor type enum, which may cause problems if we update the enum in WasmEdge.

However, most backends return WASINN::ErrNo::InvalidArgument when getting a tensor type other than F32. The example at https://github.com/second-state/WasmEdge-WASINN-examples uses U8 for GGML, but the GGML backend doesn't seem to check tensor type values ​​anyway. The value of F32 enum is not changed, so the impact is likely to be small.

Is there a specific version of wasi-nn proposal that WasmEdge currently supports or intend to support?

@Ananya05iit
Copy link

@hydai
Hi !! I am greatly interested in this project, being proficient in C++ and ML. I also know piper workflow and rust. This seems a very interesting project, to me, waiting to be a part of it!

@hydai
Copy link
Member Author

hydai commented May 26, 2024

@hydai

As far as I know, Piper seems to use ONNX models and onnxruntime. I think instead of making Piper a new WASI-NN backend, we should support ONNX.

There are some relationships between backends and model file formats:

OpenVINO - .bin and .xml

ONNX - .onnx

PyTorch - .pt

TensorflowLite - .tflite

GGML - .gguf

I think Piper is at a higher level. Therefore, in order for Piper to work in WasmEdge, it should have some libraries that use the ONNX backend for synthesis (phoneme ids to audio), and other processing (like phonemization) should run in wasm (non system-level part).

Please let me know what you think.

Why not both? ONNX could be one of the WASI-NN backends, and so does Piper.

@hydai
Copy link
Member Author

hydai commented May 26, 2024

Regarding ONNX, WasmEdge seems to support it before, but it was removed for some reasons.

Since nobody will maintain it, this backend is currently not enabled on the master branch.

Is there a specific version of wasi-nn proposal that WasmEdge currently supports or intend to support?

Because WASI-NN is migrating to the Component Model proposal, however, it's non sense to wait for it to ship in the preview 2 proposals. We are forking wasmedge-wasi-nn as the SDK we mentioned in the appendix section and use it for now.

@PeterD1524
Copy link

PeterD1524 commented May 26, 2024

Why not both? ONNX could be one of the WASI-NN backends, and so does Piper.

The WASI-NN API is very powerful. Since it accepts bytes (tensors) and outputs bytes (tensors), almost any function can be wrapped around it. I think it will be abusing of the API if we add too much stuff to it. One of the benefits of performing machine learning inference in WASM is its sandbox security. If there are too many backends, the security will depend heavily on the implementation of the backends.

If we really need Piper, would it be better to make just another plugin (such as "wasmedge-piper-tts")?

When I see something claiming to be WASI-NN I expect it to follow the WASI-NN proposal. The WASI-NN Component Model currently does not have a graph-encoding (backend) enum for Piper. Also, Piper does not look like a graph IR, but a very specific tts system to me. Do you think piper will enter the WASI-NN list in the future?

If I understand correctly, WasmEdge currently does not support multiple WASI-NN backends at the same time (this can probably be changed). One possible use case of multiple backends is to use GGML to generate LLM response and pass it to Piper to synthesize audio. If Piper is in another plugin, there will be no problem.

To summarize, the benefits of making Piper another plugin outside of WASI-NN are:

  1. The WASI-NN plugin can still be consistent with the proposal.
  2. No need to follow the WASI-NN procedure load -> init-execution-context -> set-input -> compute -> get-output. We can probably design an API that is more suitable for Piper.
  3. Can coexist with other WASI-NN backends (Not a problem if the code structure of WASI-NN plugin is changed)

Implementing a completely new plugin can be more difficult though.

@hydai
Copy link
Member Author

hydai commented May 26, 2024

The WASI-NN API is very powerful. Since it accepts bytes (tensors) and outputs bytes (tensors), almost any function can be wrapped around it. I think it will be abusing of the API if we add too much stuff to it. One of the benefits of performing machine learning inference in WASM is its sandbox security. If there are too many backends, the security will depend heavily on the implementation of the backends.

WASI-NN API should be a general API. I think all of the ML/AI-related frameworks should be one of its backends if we can do this. Otherwise, there will be WASI-TTS, WASI-LLM, WASI-ObjectDetection, WASI-SpeechToText, and more. I don't think we will really need so many different specs in the future. Also, if there are too many plugins, the security will also depend heavily on their implementation. Backends and plugins are the same. Both are the host functions.

If we really need Piper, would it be better to make just another plugin (such as "wasmedge-piper-tts")?

We can do it but we may not want it. If the execution flow is just like WASI-NN style, which are load, init, set-input, comput, and get-output. Why don't we choose WASI-NN as our spec instead of creating a brand-new one?

When I see something claiming to be WASI-NN I expect it to follow the WASI-NN proposal. The WASI-NN Component Model currently does not have a graph-encoding (backend) enum for Piper. Also, Piper does not look like a graph IR, but a very specific tts system to me. Do you think piper will enter the WASI-NN list in the future?

I will say it is possible; the original WASI-NN proposal doesn't contain ggml/gguf, but it has now.

If I understand correctly, WasmEdge currently does not support multiple WASI-NN backends at the same time (this can probably be changed). One possible use case of multiple backends is to use GGML to generate LLM response and pass it to Piper to synthesize audio. If Piper is in another plugin, there will be no problem.

We do support multiple backends; the only thing you need to do is set multiple backends enabled, that's all. We just don't release pre-built assets since there are various combinations. The most important thing is that TF/PyTorch relies on dynamic libraries; that's why we don't want to create all-in-one WASI-NN pre-built assets to avoid the dependencies nightmare. However, llama.cpp and piper.cpp are fine; they can be linked inside the plugins, and we can ship them together without the dependency issues.

To summarize, the benefits of making Piper another plugin outside of WASI-NN are:

  1. The WASI-NN plugin can still be consistent with the proposal.
  2. No need to follow the WASI-NN procedure load -> init-execution-context -> set-input -> compute -> get-output. We can probably design an API that is more suitable for Piper.
  3. Can coexist with other WASI-NN backends (Not a problem if the code structure of WASI-NN plugin is changed)

Implementing a completely new plugin can be more difficult though.

So, I am fine with accepting Piper as a standalone plugin if it offers more benefits. Once the execution flow is complete, it's pretty easy to move it to a WASI-NN backend or keep it standalone.

@PeterD1524
Copy link

PeterD1524 commented May 26, 2024

WASI-NN API should be a general API. I think all of the ML/AI-related frameworks should be one of its backends if we can do this. Otherwise, there will be WASI-TTS, WASI-LLM, WASI-ObjectDetection, WASI-SpeechToText, and more. I don't think we will really need so many different specs in the future. Also, if there are too many plugins, the security will also depend heavily on their implementation. Backends and plugins are the same. Both are the host functions.

Yes, I understand backends and plugins are the same. My original thought was that we would only provide a limited major NN graph backends and libraries for specific tasks will have to be implemented in wasm user functions. I get your idea though.

We do support multiple backends; the only thing you need to do is set multiple backends enabled, that's all. We just don't release pre-built assets since there are various combinations. The most important thing is that TF/PyTorch relies on dynamic libraries; that's why we don't want to create all-in-one WASI-NN pre-built assets to avoid the dependencies nightmare. However, llama.cpp and piper.cpp are fine; they can be linked inside the plugins, and we can ship them together without the dependency issues.

I just realize WASMEDGE_PLUGIN_WASI_NN_BACKEND can be a semicolon or whitespace seperated list because of the foreach command. Before I only read this CMakeLists.txt and thought multiple backends was not supported.
https://github.com/WasmEdge/WasmEdge/blob/b24b8f0bde9a9fb680613c1053ab5b23568c788d/plugins/wasi_nn/CMakeLists.txt#L4C18-L4C49
Perhaps this one should be wrapped in a foreach too?

Thank you for your detailed explanation. This really helps me understand the goal of this mentorship.

@PeterD1524
Copy link

PeterD1524 commented May 27, 2024

I have created a fork of WasmEdge which supports piper as a backend at https://github.com/PeterD1524/WasmEdge/tree/wasi_nn_piper. It works almost the same as piper's command line but with json as input. Piper currently does not expose its code as a library so I have to patch its CMakeLists to make things work. Maybe there is a better way to do this.

Here is an example using the API with the rust wasmedge-wasi-nn crate.

fn main() {
    let graph = wasmedge_wasi_nn::GraphBuilder::new(
        wasmedge_wasi_nn::GraphEncoding::Piper,
        wasmedge_wasi_nn::ExecutionTarget::CPU,
    )
    .build_from_bytes([serde_json::json!({
        "model": "en_US-lessac-medium.onnx", // path to .onnx voice file, required
        "config": "en_US-lessac-medium.onnx.json", // path to model config, default is model path + .json
        "espeak_data": "espeak-ng-data", // path to espeak-ng data directory, required for espeak phonemes
    })
    .to_string()])
    .unwrap();

    let mut context = graph.init_execution_context().unwrap();

    context
        .set_input(
            0,
            wasmedge_wasi_nn::TensorType::U8,
            &[1],
            "Welcome to the world of speech synthesis!".as_bytes(),
        )
        .unwrap();
    context.compute().unwrap();

    // output is wav by default
    let mut out_buffer = vec![0u8; 1 << 20];
    let size = context.get_output(0, &mut out_buffer).unwrap();
    std::fs::write("welcome.wav", &out_buffer[..size]).unwrap();
}

The enum wasmedge_wasi_nn::GraphEncoding::Piper has to be added to make the API work. Right now I just patch the crate locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants
Projects
None yet
Development

No branches or pull requests

8 participants