Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] risc0_zkvm serde fails to deserialize if the serialized type differs #1742

Open
zvolin opened this issue Apr 29, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@zvolin
Copy link

zvolin commented Apr 29, 2024

Bug Report

The title is pretty generic but I don't know how to express that cleaner. It's pretty common to use some tricks when implementing Serialize and Deserialize by hand. One such trick is to serialize String and deserialize &str, to avoid double allocations.

This example will fail:

use serde::de::Error as _;
use serde::{Deserialize, Deserializer, Serialize, Serializer};

#[derive(Debug)]
struct Abc(Vec<u8>);

impl Serialize for Abc {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        let hex = hex::encode(&self.0);
        serializer.serialize_str(&hex)
    }
}

impl<'de> Deserialize<'de> for Abc {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        // use <&str> to avoid intermediate allocation
        let hex = <&str>::deserialize(deserializer)?;
        hex::decode(hex).map(Self).map_err(D::Error::custom)
    }
}

fn main() {
    let abc = Abc(vec![1, 2, 3, 4]);
    let serialized = risc0_zkvm::serde::to_vec(&abc).unwrap();
    let deserialized: Abc = risc0_zkvm::serde::from_slice(&serialized).unwrap();

    println!("{deserialized:?}");
}

Another pattern I'm aware of (eg. tendermint-rs does that) is to serialize T but deserialize Option<T> and have some fallback in case it's not present.

Expected behavior

It would be cool if risc0_zkvm would support such tricks as they're not so uncommon in the ecosystem. Ciborium is an example of binary serializer that supports both of the above, and bincode seems to support the &str trick.

Your Environment

  • risc0-zkvm version: 0.21.0
  • Rust version: cargo 1.76.0 (c84b36747 2024-01-18)
  • Platform/OS: archlinux 6.8.7-zen1-1-zen
@zvolin zvolin added the bug Something isn't working label Apr 29, 2024
Copy link

linear bot commented Apr 29, 2024

@preston-evans98
Copy link

Hey @zvolin! I took a quick look at this and it's looks like the root cause is that risc0's from_slice method delegates to an implementation over a Reader, so borrowing data is disallowed. You'll get the same error (verbatim) if you try to deserialize an &str using serde_json::from_reader.

It seems like fixing this should be relatively straightforward, but I couldn't get the zkvm crate to build locally on my first try, so I'll wait for the actual team to come along :)

@preston-evans98
Copy link

As for the Option optimization you mention, I think that's unsound and the fact that it happens to work in some implementations is just coincidence. In the serde data model, Some(T) is not equivalent to T - you use serialize_some for the first case, but not in the second. It would be sound to use #[serde(from = "Option<T>", into = "Option<T>")], but it's unsound to use just one.

This trick seems to be used in Tendermint because they only care about supporting json, which serializes Option<T> and T identically .

@flaub
Copy link
Member

flaub commented Apr 30, 2024

Note: to build on main you'll need to install cargo-risczero from main as well. This situation should be resolved once the next release lands.

cargo install --path risc0/cargo-risczero
cargo risczero install

@zvolin
Copy link
Author

zvolin commented Apr 30, 2024

As for the Option optimization you mention, I think that's unsound and the fact that it happens to work in some implementations is just coincidence. In the serde data model, Some(T) is not equivalent to T - you use serialize_some for the first case, but not in the second. It would be sound to use #[serde(from = "Option", into = "Option")], but it's unsound to use just one.

I've read the serde data model and I'm not sure I agree.

the Serialize implementation for the data structure is responsible for mapping the data structure into the Serde data model by invoking exactly one of the Serializer methods

the Deserialize implementation for the data structure is responsible for mapping the data structure into the Serde data model by passing to the Deserializer a Visitor implementation that can receive the various types of the data model

But there is no fundamental reason that these mappings need to be straightforward. The Serialize and Deserialize traits can perform any mapping between Rust type and Serde data model that is appropriate for the use case.

This doesn't answer the question directly, but more or less I understand it that Visitor can accept different types, and mapping don't need to be straighforward. There is even example of serializing path as one enum variant and deserializing into other.

I always thought of <Option<T>>::deserialize being a trick to avoid using ugly Visitor api, but my understanding is that if I had a visitor with visit_i32 and visit_none that would be equivalent of <Option<i32>>::deserialize and it would be sound. Not sure too if sound and unsound aren't too big words for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants