Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(all): add versioning of serializable types on tfhe-rs 0.6 #1151

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

nsarlin-zama
Copy link
Contributor

@nsarlin-zama nsarlin-zama commented May 15, 2024

closes: zama-ai/tfhe-rs-internal#538

PR content/description

This PR adds data versioning to serialized types for backward compatibility between tfhe-rs versions. This is done using a new crate, tfhe-versionable, that adds a set of derive macros. These macro derive a pair of traits (Versionize/Unversionize) that add conversion functions between a type and its "versioned" representation. The versioned representation of a type is an enum where each variant is a version of the type.

Before serialization, the type is wrapped into the latest variant of the enum in the versionize method of the Versionize trait. To be able to use it after deserialization, the enum is converted into the target type with the unversionize method of the Unversionize trait. To make this work, we have to define for each older version of a type an upgrade method that is able to transform version Vn into Vn+1. The generated unversionize method will chain calls of upgrade enough times to get to the latest version.

For a given type that has to be versioned, there are 3 macro that should be used:

  • Versionize: used on the main type, that is used elsewhere in the code. Will derive the Versionize/Unversionize traits
  • Version: used on a previous version of the type. Versionize also automatically derive Version for the latest version.
  • VersionsDispatch: used on the enum with all the versions. Each variant should derive Version, except the last one that derives Versionize

a fourth proc macro NotVersioned can be used on a type that should not be versioned. The Versionize/Unversionize traits will be implemented using Self as versioned representation of the type. This is used for built-in types.

Here is an example of the workflow:

use tfhe_versionable::{Unversionize, Upgrade, Version, Versionize, VersionsDispatch};

// The structure that should be versioned, as defined in tfhe-rs
#[derive(Versionize)]
#[versionize(MyStructVersions)] // Link to the enum type that will holds all the versions of this type
struct MyStruct<T: Default> {
    attr: T,
    builtin: u32,
}

// To avoid polluting the main code code, the old versions are defined in another module/file, along with the dispatch enum
#[derive(Version)] // Used to mark an old version of the type
struct MyStructV0 {
    builtin: u32,
}

// The Upgrade trait tells how to go from the first version to the last. During unversioning, the
// upgrade method will be called on the deserialized value enough times to go to the last variant.
impl<T: Default> Upgrade<MyStruct<T>> for MyStructV0 {
    fn upgrade(self) -> MyStruct<T> {
        MyStruct {
            attr: T::default(),
            builtin: self.builtin,
        }
    }
}

// This is the dispatch enum, that holds one variant for each version of your type.
#[derive(VersionsDispatch)]
// This enum is not directly used but serves as a template to generate new enums that will be
// serialized. This allows recursive versioning.
#[allow(unused)]
enum MyStructVersions<T: Default> {
    V0(MyStructV0),
    V1(MyStruct<T>),
}

fn main() {
    let ms = MyStruct {
        attr: 37u64,
        builtin: 1234,
    };

    let serialized = bincode::serialize(&ms.versionize()).unwrap();

    // This can be called in future versions of tfhe-rs, when more variants have been added
    let _unserialized = MyStruct::<u64>::unversionize(bincode::deserialize(&serialized).unwrap());
}

The proc macro are used to handle the versioning recursivity. If we see a type definition as a tree where each type is a node and its children are the types of its attributes, the version of a given type is made to be independent of the version of its children. That way, if we update a type we don't have to manually update the version of all the type that recursively use it.

The macros handle:

  • Struct/enum/union
  • generics
  • conversion with a call to into/from/try_from before and after the versioning/unversioning (similarly to serde)

Internals

Internally, the Version proc macro will generate for each version of the type a pair of associated types. Each associated types will have the same shape as the type that the macro is derived on except that their fields will be replaced by their versioned representation. The difference between the two types is that one is defined using references and the other using owned data. This allows to try to avoid copies as much as possible.

For example for this type:

struct MyStruct {
  inner: MyStructInner
}

the macro will generate these types:

#[derive(Serialize)]
struct MyStructVersion<'vers> {
  inner: MyStructInner::Versioned<'vers>
}

#[derive(Serialize, Deserialize)]
struct MyStructVersionOwned {
  inner: MyStructInner::VersionedOwned
}

MyStructVersion will be used for versioning if possible, and MyStructVersionOwned for unversioning and for versioning if it is not possible to use a reference. The macro also generates conversion methods between a type and its Version associated types. It also implements a Version trait that allows easier access to these generated types in other macro.

Similarly, the VersionsDispatch macro will generate for the dispatch enum two associated enums, one with references and one with owned data. These enums will be used as the versioned representation for the type. They are the result and parameters of the versionize and unversionize methods and can be serialized/deserialized:

enum MyStructVersions {
  V0(MyStructV0),
  V1(MyStruct)
}

// this is generated by `VersionsDispatch`
#[derive(Serialize)]
enum MyStructVersionsDispatch<'vers> {
  V0(MyStructV0Version<'vers>),
  V1(MyStructVersion<'vers>)
}

#[derive(Serialize, Deserialize)]
enum MyStructVersionsDispatchOwned {
  V0(MyStructV0VersionOwned),
  V1(MyStructVersionOwned)
}

Finally, the Versionize macro will use the generated enums. versionize is just a conversion between MyStruct and the latest variant of MyStructVersionsDispatch and unversionize is a conversion between MyStructVersionDispatchOwned and MyStruct (slightly more complicated because of the chained calls to upgrade)

TODO

Versionize is currently implemented for the shortint ciphertext and all its subtypes.

  • Implement the proc-macro
  • Versionize all the things !
  • Handle errors during unversioning (ex: failed upgrades or conversion)
  • Generate test data

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

@cla-bot cla-bot bot added the cla-signed label May 15, 2024
@nsarlin-zama nsarlin-zama changed the base branch from main to release/0.6.x May 15, 2024 14:06
@nsarlin-zama
Copy link
Contributor Author

I don't think this is a breaking change for serialization/deserialization, since this PR only adds an optional set of methods on every types. i.e. messages serialized directly will still be deserializable after this PR. To use the versioning you need to use the versionize/unversionize methods.

@nsarlin-zama nsarlin-zama force-pushed the ns/0.6_with_versionize branch 4 times, most recently from a31dd7b to 25cf708 Compare May 15, 2024 15:18
@nsarlin-zama nsarlin-zama changed the base branch from release/0.6.x to integration/versioning May 16, 2024 14:27
Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments that have nothing to do with the code to start

@nsarlin-zama nsarlin-zama force-pushed the ns/0.6_with_versionize branch 2 times, most recently from 51694f7 to be733d5 Compare May 17, 2024 11:58
Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional non code comments, then we'll get to the meat of the proc macro review (though I'm far from a pro on that)

I like how little code it ends up requiring in TFHE-rs (though I know there is no upgrade implementations for now but it is looking very promising)

tfhe/src/core_crypto/commons/ciphertext_modulus.rs Outdated Show resolved Hide resolved
utils/tfhe-versionable/Cargo.toml Show resolved Hide resolved
utils/tfhe-versionable/derive/Cargo.toml Outdated Show resolved Hide resolved
@tmontaigu
Copy link
Contributor

tmontaigu commented May 21, 2024

Still in the process of reviewing,

I have a doubt on the fact that the derive crate is in the tfhe-versionable crate dir, I wonder if when doing cargo package/publish the code from the derive crate is going to be included in the tar ball that cargo uploads (and is not going to use it to build tfhe-versionable as it goind to download the proper the-versionable-derive crate)

@nsarlin-zama
Copy link
Contributor Author

I have a doubt on the fact that the derive crate is in the tfhe-versionable crate dir, I wonder if when doing cargo package/publish the code from the derive crate is going to be included in the tar ball that cargo uploads (and is not going to use it to build tfhe-versionable as it goind to download the proper the-versionable-derive crate)

Maybe I can just move it up a level into tfhe-rs/utils/tfhe-versionable-derive ?

@tmontaigu
Copy link
Contributor

I tried with cargo package --list, seems like the derive crate sources is not included.

It may still be worth to move the crate up a level as its a crate not a module

@nsarlin-zama nsarlin-zama force-pushed the ns/0.6_with_versionize branch 2 times, most recently from b317723 to c4c6b85 Compare May 21, 2024 13:36
@IceTDrinker
Copy link
Member

will need a rebase the action fix has been merged in release/0.6.x as it was needed

utils/tfhe-versionable/derive/src/lib.rs Outdated Show resolved Hide resolved
if let Some(target) = &self.from {
quote! { #target::unversionize(#arg_name).into() }
} else if let Some(target) = &self.try_from {
quote! { #target::unversionize(#arg_name).try_into().unwrap() } // TODO: handle errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be handled now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will also update the signature of the update and versionize methods to make them return a Result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upgrade and unversionize methods now return a Result. This result is propagated to the outermost unversionize call. This Result chain also catches error returned by a try_from conversion provided inside the versionize attribute.

@nsarlin-zama nsarlin-zama force-pushed the ns/0.6_with_versionize branch 2 times, most recently from a48d17a to 329c3d7 Compare June 3, 2024 09:31
@nsarlin-zama nsarlin-zama marked this pull request as ready for review June 4, 2024 07:54
@IceTDrinker
Copy link
Member

I haven't followed the latest changes, so Thomas will know better, but if you followed his recommendations it should be good

@nsarlin-zama nsarlin-zama merged commit 6c8f972 into integration/0.6/versioning Jun 4, 2024
15 checks passed
@nsarlin-zama nsarlin-zama deleted the ns/0.6_with_versionize branch June 4, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants