Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement histogram statistics decoder #14097

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented May 14, 2024

This PR speeds up histogram_count and histogram_sum functions on native histograms. The idea is to use a separate decoder which can be used by the engine to skip histogram buckets when decoding histogram objects. This should help with reducing allocations and make the sum aggregation faster it will only be done on primitive values excluding buckets and spans.

Current benchmark results

prometheus $ benchstat main.out new.out                                               
name                                                          old time/op    new time/op    delta
NativeHistograms/sum-11                                          404ms ± 6%     391ms ± 1%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            500ms ± 8%     492ms ± 8%     ~     (p=0.548 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             803ms ± 1%     815ms ± 3%     ~     (p=0.095 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     477ms ± 4%     318ms ± 5%  -33.29%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      818ms ± 4%     582ms ± 1%  -28.77%  (p=0.008 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
NativeHistograms/sum-11                                          413MB ± 0%     414MB ± 0%     ~     (p=0.841 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            213MB ± 0%     213MB ± 0%     ~     (p=0.095 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             314MB ± 0%     314MB ± 0%     ~     (p=0.841 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     213MB ± 0%     143MB ± 0%  -32.94%  (p=0.016 n=5+4)
NativeHistograms/histogram_count_with_long_rate_interval-11      321MB ± 2%     202MB ± 6%  -37.15%  (p=0.008 n=5+5)

name                                                          old allocs/op  new allocs/op  delta
NativeHistograms/sum-11                                          5.16M ± 0%     5.16M ± 0%     ~     (p=1.000 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            4.61M ± 0%     4.61M ± 0%     ~     (p=0.381 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             6.12M ± 0%     6.12M ± 0%     ~     (p=1.000 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     4.61M ± 0%     1.66M ± 0%  -63.99%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      6.14M ± 0%     2.01M ± 3%  -67.36%  (p=0.008 n=5+5)

This commit is a POC of how we can speed up histogram_count and histogram_sum
functions on native histograms. The idea is to have separate decoders which can be
used by the engine to only read count/sum values from histogram objects. This should help
with reducing allocations when decoding histograms, as well as with speeding up aggregations
like sum since they will be done on floats and not on histogram objects.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

I've been tinkering with this approach a bit and I wonder if it could interfere with various mixed types checks, as well as with histogram counter resets. I will investigate another solution which returns a partially filled histogram object so that we can still go through the same histogramRate path.

@fpetkovski fpetkovski marked this pull request as draft May 15, 2024 05:38
@fpetkovski fpetkovski force-pushed the histogram-stats-decoder branch 3 times, most recently from 8df13fa to 0a72cc5 Compare May 15, 2024 07:43
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

Looking at profiles, I see CopyToSchema taking a lot of time. Do we need to decode/fill the schema field if we're only operating on Count/Sum values @krajorama ?
image

@fpetkovski fpetkovski marked this pull request as ready for review May 15, 2024 07:58
@beorn7 beorn7 requested review from beorn7 and removed request for aknuds1, roidelapluie and jesusvazquez May 16, 2024 13:26
@beorn7
Copy link
Member

beorn7 commented May 16, 2024

Generally, this looks very nice.

However, there is a nasty edge case: In rare cases, a counter reset could not be seen in the count but only in individual buckets. We do detect those cases on ingestion and set the counter reset hint accordingly, so generally, this should make it through the special decoder here via the counter reset hint. However (edge case of an edge case of an edge case…), the following scenario could happen:

  • A chunk gets cut just because the previous chunk was full (and not because of a counter reset). In that case, we mark the counter reset hint as "unknown" because we cannot know if the previous chunk gets deleted. The idea is that we do manual counter reset detection in that case, just to play safe.
  • If the previous chunk gets really deleted, and if there is really a counter reset between the pre-previous chunk and the current chunk, and if that counter reset isn't detectable via the count alone, but only via (some of) the buckets, then this PR creates a problem (a different result than the one we get with the full decoding).

Maybe we can solve that cleanly by doing the counter reset detection directly in the iterator (and set the counter reset accordingly) in cases where the "unknown" hint is coming from TSDB.

Or we can just wing it and claim that this case is so outrageously unlikely that we don't need to handle it.

ZeroCount: f.hReader.ZeroCount,
Sum: f.hReader.Sum,
ZeroThreshold: f.hReader.ZeroThreshold,
Schema: f.hReader.Schema,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the schema is only for buckets, I think we can set this to any number without doing harm. If we set it consistently to the same number (e.g. 0), we avoid the CopyToSchema step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed both the zero threshold and schema fields.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown.

@fpetkovski
Copy link
Contributor Author

We currently don't have a function to detect resets for integer histograms. Since PromQL always works with float histograms, I might need to add the same functionality just operating on integer buckets and spans.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comment nits.

promql/engine.go Outdated
// detectHistogramStatsDecoding modifies the expression by setting the
// SkipHistogramBuckets field in those vector selectors for which it is safe to
// return only histogram statistics (sum and count), excluding histogram spans
// and buckets. The function can be treated as an optimization and does is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"does" appears spurious. Maybe "thus"? Or just delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was spurious.

LabelMatchers []*labels.Matcher
Offset time.Duration
Timestamp *int64
SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit easier to read:

Suggested change
SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query.
SkipHistogramBuckets bool // Set when decoding native histogram buckets isn't needed for query evaluation.

@beorn7
Copy link
Member

beorn7 commented May 22, 2024

Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown.

Any ideas so far?

It would actually be quite cool if we could detect counter resets in the iterator already. Then we could remove it entirely in the PromQL layer (for native histograms, at least).

If we can somehow safely assess if a chunk got deleted before the current chunk, we could avoid the manual counter reset detection even more. A heuristic would be if there is a gap larger than the typical scrape interval, but that's not strictly correct. I have a vague memory that we never delete chunks from the middle of a block, so that might narrow down the problem.

Just brainstorming here…

I'm almost ready to believe that the above mentioned edge case of an edge case of an edge case is rare enough that we can ignore it (but document it, just in case it becomes relevant after all).

@beorn7
Copy link
Member

beorn7 commented May 22, 2024

Another thought that crossed my mind: What about overlapping blocks and (soon) out-of-order handling?
This might create a lot more cases where we set the counter reset hint to "unknown". Maybe we should simply skip the optimization in those cases. (Again, I have to apologize that I just type out thoughts crossing my mind, based on vague memories from the past, without having done a proper research into the matter…)

@fpetkovski
Copy link
Contributor Author

fpetkovski commented May 22, 2024

I've been busy moving apartments this week and did not have time to work in this further. I'll have my time back next week and will continue looking into the reset.

My opinion is that we should handle the reset in the iterator by keeping track of the last decoded histogram. This way the code can also remain future proof in case we introduce more edge cases down the line.

What about overlapping blocks and (soon) out-of-order handling?

I remember looking into this code as well, and we seem to set unknown in the chain iterator whenever we change chunk. So I think the logic would break there unless we do explicit detection.

@beorn7
Copy link
Member

beorn7 commented May 23, 2024

Good luck with the move. Looking forward to further explorations into this issue.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

@beorn7 I've addressed all comments and we now detect the reset in the iterator itself. I also moved iterator to the promql package since this is the only place where it is used. It also relies on public methods so we don't need access to encoding internals from the chunkenc package.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@beorn7
Copy link
Member

beorn7 commented May 30, 2024

Thank you very much. This is now very close to the top of my review queue, but I won't get to it this week. Sorry for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants