Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OpenTelemetry to enhance observability #379

Open
springmin opened this issue May 13, 2024 · 2 comments
Open

Integrate OpenTelemetry to enhance observability #379

springmin opened this issue May 13, 2024 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@springmin
Copy link

Feature request type

enhancement

Is your feature request related to a problem? Please describe

integrate OpenTelemetry or other trance component to enhance observability。

Describe the solution you'd like

integrate OpenTelemetry or other trance component to enhance observability。

Describe alternatives you've considered

No response

Additional context

No response

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 13, 2024

I've been thinking about this for a while (and implementing the server using Generic Host in general).

By using the new System.Diagnostics.Metrics APIs (Counter<T>, UpDownCounter<T>, Histogram<T> etc. ..) would allow us to delete HdrHistogram fork altogether from the repo and simplify our metrics/monitoring logic a lot. The places where we pull the metrics inside the server (LATENCY etc?) we should be able to use simple MeterListener to aggregate them). I'm not sure if Histogram<T> covers everything that HdrHistogram offers and someone who knows statistics better than me should weigh in on that (dotnet/runtime#63650 seems to be useful addition).

We should follow the existing OTel semantic conventions for Redis but of course modify/extend it if needed. Also configurations knob whether to record db.statement or not, as I believe there is currently no redaction capabilities in Garnet for potentially sensitive parts of the command.

Measuring anything should become a lot easier, but we should follow how ASP.NET & System.Net guards their metrics logic to keep the overhead next to none IF the metrics are disabled (Instrument.Enabled). Should the metrics codepaths should be entirely eliminated when Garnet is built with <MetricsSupport>false</MetricsSupport> given that would also affect the behavior of the RESP commands that rely on it, if they are implemented using MeterListener?

We should also measure if any extra contention (IIRC currently Garnet's counters are done by atomic Interlocked.Increments) for normal use-cases. We should keep eye on how big difference there is in RPS with, for example, dotnet-counters hooked.

Some existing tracking issues for S.D.Metrics performance concerns dotnet/aspnetcore#50412, dotnet/runtime#71563 to keep eye on too.

(I've been thinking about this for quite bit because I'm planning to write Crank infrastructure & scenarios to bench Garnet, someday.)

@badrishc
Copy link
Contributor

It would be a great idea to support OpenTelemetry on the server side, and contributions in this space are highly encouraged! Not only is this generally useful, but when used with .NET Aspire it will make the metrics automatically available. Doing this seems to basically require some glue code to feed data from our current Metrics API endpoint to OpenTelemetry:

public MetricsApi Metrics;

The potential migration to System.Diagnostics.Metrics is orthogonal though. It needs to be considered with care:

We should also measure if any extra contention (IIRC currently Garnet's counters are done by atomic Interlocked.Increments)

HdrHistogram is very efficient, and was better than the Metrics API last we checked. To avoid contention, we keep per-session metrics so that they can be updated without requiring atomics such as Interlocked.Increment. In the critical path, it is just a simple add on the session:

_counts[index]++;
_totalCount++;

Then we aggregate periodically (or on-demand) across all active and inactive sessions to perform the metrics computation. HdrHistogram is really efficient on the record metrics path and does the expensive percentile computations off the critical path.

@darrenge darrenge added the enhancement New feature or request label May 14, 2024
@badrishc badrishc changed the title integrate OpenTelemetry to enhance observability Integrate OpenTelemetry to enhance observability May 14, 2024
@badrishc badrishc added the help wanted Extra attention is needed label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants