Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow RA metrics to be reported upon parsing completed or accumulated #108726

Merged
merged 13 commits into from
May 17, 2024

Conversation

pgomulka
Copy link
Contributor

RAmetric can be implemented so that they could be reported before they are being indexed (like with a new field being added)
or they could be accumulated and reported upon shard commit as an additional metadata

This commit addes new method to DocumentSizeReporter#onParsingComplted DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator

based on #108449

expose size observer plugin to stateless

integ test

remove unused assertions

spotless

spotless
RAmetric can be implemented so that they could be reported before they are being indexed (like
with a new field being added)
or they could be accumulated and reported upon shard commit as an additional metadata

This commit addes new method to DocumentSizeReporter#onParsingComplted
DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits
DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator

based on elastic#108449
@pgomulka pgomulka added >enhancement :Core/Infra/Metrics Metrics and metering infrastructure labels May 16, 2024
@pgomulka pgomulka self-assigned this May 16, 2024
@pgomulka pgomulka requested a review from a team as a code owner May 16, 2024 13:46
@pgomulka pgomulka changed the title Allow ra metrics to be reported upon parsing completed or accumulated Allow RA metrics to be reported upon parsing completed or accumulated May 16, 2024
@elasticsearchmachine elasticsearchmachine added v8.15.0 Team:Core/Infra Meta label for core/infra team labels May 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @pgomulka, I've created a changelog YAML for you.

* @param map a map with previous value of size
* @return an map with a new value of size
*/
Map<String, String> totalDocSizeMap(Map<String, String> map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That signature seems match a very specific use case only, making it hard to understand in this context. Could you remind me again how we are using it? Could this just be getAndReset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using this in IndexEngine. we are returning this map so that it can be stored in a commit userdata.
getAndReset feels fine to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bother again about naming, but looking at what EMPTY_INSTANCE does and from what I remember, this is supposed to enrich an existing map passed via argument, adding the accumulated size, right?
If so, getAndReset seems a bit off. I know enrichMapAndResetAccumulator is a bit much, but if this is what it does..
(I might be wrong here, please correct me!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I remember

From the POC on which this PR is based, I mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrich an existing map passed via argument
not really enrich, but fetch a previous value of ra_storage_key and add a new accumulated size
We should not try to enrich it as it will contain other values (TRANSLOG_RECOVERY_START_FILE) which will be overriden later in getCommitExtraUserData

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just noticed the change in serverless. That was a misunderstanding, I had int getAndReset(); in mind, but I see why you want to pass the map through here. My bad :(

How about the following? That might better capture the intention of this?

Map<String, String> incrementCommitExtraUserData(SegmentInfos segmentInfos)

Or more "precise", though clunky (and not really a serious recommendation)

Optional<Entry<String, String>> incrementCommitExtraUserData(SegmentInfos segmentInfos)

Or

Map<String, String> incrementAndReset(Map<String, String>)

Though I honestly don't like the signature that passes in a map. It makes it look like the input map is modified, which isn't really the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

Map<String, String> incrementSizeAndReset(SegmentInfos segmentInfos)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I int getAndReset(); would mean we would have to 'leak' the logic of the aggregation of previous value into the IndexEngine
I think we don't really have to explicitly say in a method name that we will reset the accumulator. We could keep this as an implementation detail, it does not matter from IndexEngine point of view what happens with the value in the accumulator - but let's leave this for the implementation Accumulator interface.

how about
Map<String, String> getAsCommitUserDataMap(SegmentInfos segmentInfos) ?
returning an empty map (/just noticed this bug/) should act as a null object pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what confused me was that in EMPTY_INSTANCE we were passing the map in and out for the "null object". So I thought: we get an existing map, and we change it. The no-op/null object would just not change it.
If we have instead Map<String, String> getAsCommitUserDataMap(SegmentInfos segmentInfos), and the no-op implementation returning an empty map, that would be clearer to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good to me, but I think we can drop the redundant Map suffix 👍

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking on name :) but otherwise looks good.

@ldematte
Copy link
Contributor

👍 Still good, even better after the renaming

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@pgomulka pgomulka merged commit 1803320 into elastic:main May 17, 2024
15 checks passed
parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this pull request May 17, 2024
…elastic#108726)

RAmetric can be implemented so that they could be reported before they are being indexed (like with a new field being added)
or they could be accumulated and reported upon shard commit as an additional metadata

This commit addes new method to DocumentSizeReporter#onParsingComplted DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator

based on elastic#108449
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Metrics Metrics and metering infrastructure >enhancement Team:Core/Infra Meta label for core/infra team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants