-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use zero page "holes" to optimize sparse byte array usage #108709
Use zero page "holes" to optimize sparse byte array usage #108709
Conversation
for (int i = 0; i < pages.length; ++i) { | ||
pages[i] = newBytePage(i); | ||
Arrays.fill(pages, ZERO_PAGE); | ||
assert assertZeroPageClean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some bugs with this, it's a little wasteful but having this in place lets me sleep better I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
protected byte[] getPageForWriting(int pageIndex) { | ||
byte[] foundPage = pages[pageIndex]; | ||
if (foundPage == ZERO_PAGE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new overhead we add to the set
path relative to not having this. It seems to me, that it's a trivial overhead, especially when considering that we're fine with lots of redundant reads in set
today. Plus, I've never seen set
hot in profiling.
@@ -162,8 +164,8 @@ public BytesRef next() { | |||
} | |||
|
|||
@Override | |||
public void fillWith(StreamInput in) throws IOException { | |||
in.readBytes(array, 0, Math.toIntExact(size())); | |||
public void fillWith(InputStream in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See CCR changes, we were doing a really hacking thing there reading into a BytesReference
via it's iterator over BytesRef
. Just passing the ByteArray
through the CCR code and reading via this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems worth a shot!
for (int i = 0; i < pages.length; ++i) { | ||
pages[i] = newBytePage(i); | ||
Arrays.fill(pages, ZERO_PAGE); | ||
assert assertZeroPageClean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks Nik, 🤞 :) |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Add the notion of a "zero page" or "hole" to big arrays. We have some use cases where we run up byte arrays of hundreds of MB that are extremely sparse.
Each page starts out as a "hole" and only gets replaced by a real page from the pool on write similar to how FS holes work.
This change adds a small amount of overhead to the write side but is performance neutral or better on the read side (for sparse arrays we likely get a big improvement from using less CPU cache).
The only change outside of the array itself this needed was in CCR, see inline comments for that.