Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use zero page "holes" to optimize sparse byte array usage #108709

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented May 16, 2024

Add the notion of a "zero page" or "hole" to big arrays. We have some use cases where we run up byte arrays of hundreds of MB that are extremely sparse.
Each page starts out as a "hole" and only gets replaced by a real page from the pool on write similar to how FS holes work.
This change adds a small amount of overhead to the write side but is performance neutral or better on the read side (for sparse arrays we likely get a big improvement from using less CPU cache).

The only change outside of the array itself this needed was in CCR, see inline comments for that.

@original-brownbear original-brownbear added WIP :Core/Infra/Core Core issues without another label labels May 16, 2024
for (int i = 0; i < pages.length; ++i) {
pages[i] = newBytePage(i);
Arrays.fill(pages, ZERO_PAGE);
assert assertZeroPageClean();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some bugs with this, it's a little wasteful but having this in place lets me sleep better I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


protected byte[] getPageForWriting(int pageIndex) {
byte[] foundPage = pages[pageIndex];
if (foundPage == ZERO_PAGE) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new overhead we add to the set path relative to not having this. It seems to me, that it's a trivial overhead, especially when considering that we're fine with lots of redundant reads in set today. Plus, I've never seen set hot in profiling.

@@ -162,8 +164,8 @@ public BytesRef next() {
}

@Override
public void fillWith(StreamInput in) throws IOException {
in.readBytes(array, 0, Math.toIntExact(size()));
public void fillWith(InputStream in) throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See CCR changes, we were doing a really hacking thing there reading into a BytesReference via it's iterator over BytesRef. Just passing the ByteArray through the CCR code and reading via this method.

@original-brownbear original-brownbear marked this pull request as ready for review May 17, 2024 15:07
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems worth a shot!

for (int i = 0; i < pages.length; ++i) {
pages[i] = newBytePage(i);
Arrays.fill(pages, ZERO_PAGE);
assert assertZeroPageClean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@original-brownbear
Copy link
Member Author

Thanks Nik, 🤞 :)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@original-brownbear original-brownbear merged commit 8ff8eff into elastic:main May 17, 2024
14 of 15 checks passed
@original-brownbear original-brownbear deleted the clever-holes-pages branch May 17, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants