-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative serialization implementation that compactly stores bytes #1303
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Added a third test that checks edge cases. Some edge cases are as follows. (Vec, u32) with 0, 1, 2, 3, 4, 5 bytes |
Let me take a look at the failed tests and think about how to fix it |
@flaub ready for CI again. My local test has a failed test that does not seem to be due to this PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial thoughts, not a thorough review
if self.status == 1 { | ||
if self.buffer[0] != 0 || self.buffer[1] != 0 || self.buffer[2] != 0 { | ||
return Err(Error::DeserializeBadByte); | ||
} | ||
} else if self.status == 2 { | ||
if self.buffer[1] != 0 || self.buffer[2] != 0 { | ||
return Err(Error::DeserializeBadByte); | ||
} | ||
} else if self.status == 3 { | ||
if self.buffer[2] != 0 { | ||
return Err(Error::DeserializeBadByte); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this could just be slicing the range you're checking. Or is this specific for perf reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort of for performance, though I feel that the compiler would be able to optimize in the same way. All these cases are not supposed to appear, and is used here to detect errors.
that's just an issue with ethers-rs. The CI uses an old version of foundry because there was a breaking change (and not a released version of ethers that works) can ignore that. Fix would be patching ethers-rs to use recent git version, using old foundry version, or just ignoring that test. |
To summarize, Austin pointed out that this would have an issue if the user is using multiple FdWriter for the same resource. For example, the user can obtain a writer to JOURNAL by just calling This issue is real because each FdWriter may just have "leftover" that has not yet written to the buffer. A solution is to have something similar to this in the guest: #[derive(Clone)]
pub struct PosixIo<'a> {
pub(crate) read_fds: BTreeMap<u32, Rc<RefCell<dyn BufRead + 'a>>>,
pub(crate) write_fds: BTreeMap<u32, Rc<RefCell<dyn Write + 'a>>>,
} So that each time a user requests a FdReader or FdWriter, it is actually getting a reference to an Rc<RefCell<>> one. Ideally there should be another layer of abstraction, so that regular users do not have to operate on Rc<RefCell<>> which would require |
Let me experiment with that idea a bit. In general, we do not want to patch here and patch there, but want to resolve this systematically. Adding a new syscall is also not reasonable. |
@austinabell Circulate it to you for a third version. This version no longer modifies how We want to restrict ourselves to serialization. The problem that we are encountering is that there are "buffered" bytes that have not been written to the stream. First of all, we can split all types in Rust into three groups.
Note that if the buffered bytes have not been fully written to the stream, it means that the last primitive type being written has to be either bool or u8 (we will just treat bool as u8 in the following). There are no primitive types after it, otherwise it would be written to the stream because the byte handler would be reset. So, there are only two possibilities of this u8.
We handle both separately. For the first case, we introduce a notion of "depth". When serializer enters a wrapper, the depth is increased by one, when it leaves the wrapper, the depth is decreased by one. When the depth is zero, it means that it must have left the last layer of meaningful wrapper, and there are no new bytes possible after it. It needs to be immediately written to the stream when the depth hits zero. The implementation looks like this: #[inline]
fn decrease_depth<W: WordWrite>(&mut self, stream: &mut W) -> Result<()> {
self.depth -= 1;
if self.depth == 0 && self.status != 0 {
stream.write_words(&[self.byte_holder])?;
self.status = 0;
}
Ok(())
} For the second case, we notice that the last u8 is in depth 0, and therefore, this u8 is put into the buffer and immediately being written down into the stream, i.e., treating u8 as u32. fn handle<W: WordWrite>(&mut self, stream: &mut W, v: u8) -> Result<()> {
if self.depth == 0 {
stream.write_words(&[v as u32])?;
} else {
......
}
Ok(())
} Let me know what you think. In addition, @austinabell can you approve the GitHub actions? So I may get CI to help finding bugs. I am fighting against the error "incompatible client version: 0.21.0-alpha.1" locally now. |
I'll need to review before we run on CI, thanks. |
If you want to test locally, you'll need to install the server from local source. You can do that with:
|
Just finished a local testing. The local test passes. |
So for context, I'm not a stakeholder for these changes, I was just reviewing to help improve the proposal since I had thought through similar patterns and personally couldn't rationalize proposing to change the default. Haven't checked the semantics in depth, and seems like it's a reasonable implementation, but I'm still unsure about these two mainly:
|
The first point from Austin related to breaking of existing implementations. There are two thoughts here.
|
Regarding cycle count, I have a separate question. At this moment, every time a call to read or write is made, a syscall to sys_read or sys_write would happen. This is where I am more concerned on the cycle count. Would it be more performant if a user tries to wrap it with, for example, BufReader and BufWriter? I just feel that the current implementation does make syscalls very frequently. |
Reading through this, it seems the implementation is reasonable, and I can follow what the resulting codec looks like (bytes get packed, with padding whenever we transition form bytes to non-bytes). As a result, we should expect a 75% reduction in the number of calls to read a word from the host. I went ahead and ran this on a couple of our examples to get samples. With the I also went ahead and compared this to an implementation using Overall, I'm a bit reluctant to merge this approach. It feels a bit "stapled-on" to core the (de)serializer implementations and optimizes the single case of serializing All together, it feels like there is still significant work to be done, so the concern is that we will latter need to come back to this and rebuild the serializer a second time. Both of these transitions introduce some pain for developers, so ideally we'd do it once. On the other hand, we are pre-1.0. One direction we still need to explore is using alternative, off-the-shelf, codecs. We need to explore e.g. changing out our codec for @flaub do you have any opinions or intuitions on this given this information? Profiles for this branch (e8d40279faf2928a6b29d37671e3a48c2f98336b), main (8650ac9) and a version of the example using |
Here is a question: is the 90% reduction on the number of calls to "read" related to the ability to read many words at the same time because it can forecast how many needs to be read? If so, this calls for the following thing to be implemented in the guest "BufReader" So if there are available words in the file descriptor, let it read more. But a sequential question is: how big is the overhead of read_words? Is the serde cycle mostly due to the inherent machinery of serde? Note that serde cycle may, for a lot of applications, not the bottleneck, but it helps with users writing the code One solution to help with stability is to allow customized serializer and deserializer. Or implement this as a third-party dependency that wraps the existing implementation. This could be implemented in a way that is transparent and does not require modifications to RISC Zero itself, basically an env2::, or #[features(experimental_serde)] |
Related to the stable release, it is about whether we want to move forward with this new spec of the input where continuous bytes are packed. Improving the performance could be sequential non-breaking changes |
In the meantime, I uploaded a crate that converts a data struct to and back from u32 slices. (This would avoid the compatibility issue). |
This solves #1298. Below I explain some changes that require attention.
Below is obsolete.
FdWriter:
FdWriter
works because it does not offer a similar interface that allows me to "modify" the last written word. Therefore, it is added with a single-word buffer that would only be used during the u8 serialization. This buffer will be flushed when full words are written, or when FdWriter is dropped (in Rust the variable currently drops when it is no longer in the scope, such as the end of the main() function, this should work as JOURNAL processing is after it).Byte Handler:
Location of handler reset:
Vec<u8>
and the rest (updated 1 time) #1298, there are only the following basic types that RISC Zero handles (and serializes):bool, u8 -> u8
i8, i16, i32, i64, u16, u32, u64, f32, f64, char -> u32
string, byte array, i128, u128 writes bytes directly (i128 calls u128 in serializer, but is separate in deserializer)
so, we just need to make sure that for string, byte array, i128, u128. u32, we reset the handler.
string and byte array starts their serialization by calling u32, so the handler would be reset before any byte is written, we don't need to add additional reset there.
as a result