Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Introduce slot level metrics to Redis cluster #20

Open
PingXie opened this issue Mar 25, 2024 · 12 comments
Open

[NEW] Introduce slot level metrics to Redis cluster #20

PingXie opened this issue Mar 25, 2024 · 12 comments
Labels
cluster major-decision-pending Major decision pending by TSC team

Comments

@PingXie
Copy link
Member

PingXie commented Mar 25, 2024

I’m revisiting the feature proposal we discussed in redis/redis#10472, which aims at providing metrics at the slot level. Despite the substantial effort and detailed discussions back then, we didn’t land this feature. I believe it’s worth reconsidering, given the potential benefits and previous interest.

@kyle-yh-kim @zuiderkwast @madolson

@madolson
Copy link
Member

I fully agree!

@madolson
Copy link
Member

I've also re-added my favorite creature comfort. @placeholderkv/core-team thoughts?

@zuiderkwast
Copy link
Contributor

Sure, metrics seem fine. I don't have strong opinions about it, only that I think fixing the cluster consistency problems is more important than metrics.

@zuiderkwast
Copy link
Contributor

redis/redis#11432

@madolson
Copy link
Member

I think our big initial play should be cluster overhaul. I think a lot of us want it, and it makes the most compelling sense as the big "next major features".

@kyle-yh-kim
Copy link

Good to hear back on this thread, hope you all have been doing well.

Where we left-off

In total, there were 3 proposed metrics under CLUSTER SLOT-STATS command group;

  1. key_count
  2. cpu_usec
  3. memory_bytes

Next steps

memory_bytes is the most complex of all, but this shouldn't stop us from implementing the first two metrics to gain some momentum.

I will open two PRs for key_count and cpu_usec in the coming days. These PRs will be based off of the already existing PRs for key_count and cpu_usec under Redis repository.

As for CLUSTER SLOT-STATS command format, below was the latest development we agreed upon. Lengthy discussion and rationale can be found here and here.

CLUSTER SLOT-STATS
[SLOTSRANGE start-slot end-slot [start-slot end-slot ...]]|
[ORDERBY column [LIMIT limit] [ASC|DESC]]

@hwware
Copy link
Member

hwware commented Mar 27, 2024

It is great, and I prefer to add this feature in CLUSTER INFO.

@madolson
Copy link
Member

It is great, and I prefer to add this feature in CLUSTER INFO.

Why cluster info? It's a free form field I guess, it could be a new sub info field I suppose

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Mar 27, 2024
@kyle-yh-kim
Copy link

Thanks for chiming in. Personally, I'm opposed to CLUSTER INFO. We could perhaps add aggregated information under CLUSTER INFO, but not for the slot level metrics themselves.

Imagine dumping ~16384 slot level metrics under CLUSTER INFO. This would unnecessarily bloat the info string, when the user may have only wanted to check cluster_state:ok.

A new command dedicated for slot level metrics querying, in this case, CLUSTER SLOT-STATS, is more suitable. For reference, below was the latest command format we agreed on.

CLUSTER SLOT-STATS
[SLOTSRANGE start-slot end-slot [start-slot end-slot ...]]|
[ORDERBY column [LIMIT limit] [ASC|DESC]]

I'll wait for the core team to finalize this decision, before opening the PRs.

@zuiderkwast
Copy link
Contributor

@kyle-yh-kim Yeah CLUSTER SLOT-STATS. We're a bit overloaded with the forking stuff, new core team, new project, etc. but I think we want this for our next release. There was already a lot of review done and I think it was almost ready to merge. Do you want to bring over your PR?

@PingXie PingXie changed the title Introduce slot level metrics to Redis cluster [NEW] Introduce slot level metrics to Redis cluster Apr 20, 2024
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 22, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.
@kyle-yh-kim
Copy link

Ignore my spam references above, I was reviewing the diff manually over Github UI.

PR has now been opened; #351

This PR is part one of the three upcoming PRs;

  1. CLUSTER SLOT-STATS command introduction, with key-count support --> This PR.
  2. cpu-usec support
  3. memory-bytes support

kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 30, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
@kyle-yh-kim
Copy link

Moving ahead, I would like to resume our conversation on per-slot memory metrics. I'd argue this is the most important per-slot metric of all, as it enables for smoother horizontal scaling given the accurate memory tracking at per-slot granularity.

Last time, we converged on its high-level strategy in "online analysis" (amortizing memory tracking cost per mutative-command, over offline RDB snapshot analysis / forking a process), as well as its performance and memory impact. The following conclusion was drawn, before the issue was then put on halt by the previously open-sourced Redis-core team.

Overall this data seems really good to me. There is the separate project for improving main memory efficiency of the dictionary, so if these two features are released together it might not be noticeable.

Source: redis/redis#10472 (comment)

As for module consideration, I mention in details here to keep this feature as an opt-in service to maintain backwards compatibility. For opt-in modules, they will be required to accurately track its value size, and call a newly introduced hook RM_ModuleUpdateMemorySlotStats() upon its mutation, to signal valkey-server to register the memory size gain / loss from the module’s registered write commands.

If we are still aligned to this strategy, I will start on its implementation, and open incremental PRs following the merger of the above CLUSTER SLOT-STATS command PR #351.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster major-decision-pending Major decision pending by TSC team
Projects
Status: Todo
Development

No branches or pull requests

5 participants