-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Comparison (LT, GT, LTE, GTE) and prefix operations on arrow blockfile #2223
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
e60c1d7
to
f97f4a8
Compare
5ddcd3b
to
7fa7f5f
Compare
rust/worker/proptest-regressions/blockstore/arrow/blockfile.txt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good -- sorry my previous comment made you think through all this fiddly logic but I think it's correct now. A couple small questions to make sure I'm reading the code correctly.
while let Some((curr_key, curr_uuid)) = curr_iter.next() { | ||
let non_start_curr_key: Option<&CompositeKey>; | ||
match curr_key { | ||
SparseIndexDelimiter::Start => non_start_curr_key = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to simplify this by setting non_start_curr_key
to ""
if we're at the start so that every prefix is > it, right? I'm not asking for this change; partly saying this to make sure I'm correctly understanding what we're doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For prefix yes. But then for the actual key it becomes complicated depending on the type. Honestly, I feel we should remove this enum that treats start separately from everything else. and instead have default values like you are suggesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly prefer an explicit start over defaults that implicitly represent some contract. We could modify the iteration scheme to make start not a factor.
} else { | ||
// Last block. | ||
if non_start_curr_key.is_none() | ||
|| prefix >= non_start_curr_key.unwrap().prefix.as_str() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand: this means that if there's one block in the blockfile we always load and query it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! that's the first part of the OR
block
block_ids.push(*curr_uuid); | ||
} | ||
} | ||
if curr_key.is_some() && curr_key.unwrap().prefix.as_str() == prefix { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why you wanted a thorough review on this. This looks correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hindsight this could be a good coding question to test ability to think through cases
Description of changes
Summarize the changes made by this PR.
Test plan
Unit-tests in arrow blockfile
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
None