-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: add shard reduction ability to dynamic resharding #57538
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/common/options/rgw.yaml.in
Outdated
desc: Whether dynamic resharding can reduce the number of shards | ||
long_desc: If true, RGW's dynamic resharding ability is allowed to | ||
reduce the number of shards if it appears there are too many. | ||
default: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this shouldn't default to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we sit with changes in default behaviour between releases? Asking because when users upgrade their clusters, this feature would be disabled by default, which is preferable, IMO - that way clusters will continue to behave as they did prior to the upgrade, and users can opt in if they choose to by enabling the new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @nhoad. conversely, if we invest in general enhancements that aren't enabled by default, the majority of users won't benefit because they don't know to enable them. ceph has too many knobs already and can be difficult to tune correctly. we really should strive for optimal behavior by default
i think all clusters should benefit from appropriately-sized bucket indices. if this change adds new risks to existing deployments, then i'd like to explore those and try to resolve them at the design level
fc59dd4
to
70fc0a7
Compare
70fc0a7
to
af0101e
Compare
if (reshard_log) { | ||
ret = reshard_log->update(dpp, bucket_info, y); | ||
ret = reshard_log->update(dpp, bucket_info, initiator, y); | ||
if (ret < 0) { | ||
return ret; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbodley : I'm trying to think of why when having decided to execute a reshard we would update the reshard queue (log). In testing I've commented out this logic and I've seen no adverse impact, although maybe I haven't brushed up against the reason it's there. Do you have any thoughts?
@cbodley: Following your insight during the first review, I've added a commit that now tracks the initiator of the entry on the reshard queue (log). It adds a field to the at-rest data structure. Valid values are Admin and Dynamic, and it's an enum class, so other options can be added in the future. And there are logic updates that depend on this setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve the content and the name of this PR
af0101e
to
05ff067
Compare
05ff067
to
c2e5f76
Compare
Adds the ability to prevent overwriting a reshard queue (log) entry for a given bucket with a newer entry. This adds a flag to the op, so it will either CREATE or make no changes. If an entry already exists when this flag is set, -EEXIST will be returned. This is a preparatory step to adding shard reduction to dynamic resharding. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
c2e5f76
to
2d6e9d6
Compare
jenkins test make check |
2d6e9d6
to
a7e2f62
Compare
Previously, dynamic resharding could only *increase* the number of bucket index shards for a given bucket. This adds the ability to also *reduce* the number of shards. So in addition the existing 100,000 entries (current default value) per shard trigger for an increase, there's a new trigger of 10,000 entries per shard for a decrease. However, for buckets with object-counts that go up and down regularly, we don't want to keep resharding up and down to chase the number of objects. So for shard reduction to take place there's also a time delay (default 5 days). Once the entry on the reshard queue (log) is added for reduction, processing will not result in a reshard reduction within this delay period as the queue is processed. Only when the reshard entry is processed after this delay can it perform the shard reduction. However, if at any point between the time the shard reduction entry is added to the queue and after the delay, if the entry is processed and there are *not* few enough entries to trigger a shard reduction, the entry on the reshard queue entry will be discarded. So using the defaults, this effectively means the bucket must have few enough objects for a shard reduction for 5 consecutive days before the reshard will take place. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Adds a config option rgw_reshard_debug_interval that will allow us to make the resharding algorithms run on a faster schedule by allowing one day to be simulated by a set number of seconds. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
The logic for managing the reshard queue (log) can vary depending on whether the entry was added by an admin or by dynamic resharding. For example, if it's a reshard reduction, dynamic resharding won't overwrite the queue entry so as not to disrupt the reduction wait period. On the other hand, and admin should be able to overwrite the entry at will. So we now track the initiator of each entry on the queue. This adds another field to that at rest data structure, and it updates the logic to make use of it. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
a7e2f62
to
9302fbb
Compare
jenkins test api |
jenkins test windows |
seeing a new test_rgw_reshard.py failure on main that i suspect is related: https://tracker.ceph.com/issues/66367 |
p.s. this change deserves a mention in PendingReleaseNotes |
Previously, dynamic resharding could only increase the number of bucket index shards for a given bucket. This adds the ability to also reduce the number of shards.
So in addition the existing 100,000 entries (current default value) per shard trigger for an increase, there's a new trigger of 10,000 entries per shard for a decrease.
However, for buckets with object-counts that go up and down regularly, we don't want to keep resharding up and down to chase the number of objects. So for shard reduction to take place there's also a time delay (default 5 days). Once the entry on the reshard queue (log) is added for reduction, processing will not result in a reshard reduction within this delay period as the queue is processed. Only when the reshard entry is processed after this delay can it perform the shard reduction.
However, if at any point between the time the shard reduction entry is added to the queue and after the delay, if the entry is processed and there are not few enough entries to trigger a shard reduction, the entry on the reshard queue entry will be discarded.
So using the defaults, this effectively means the bucket must have few enough objects for a shard reduction for 5 consecutive days before the reshard will take place.
Fixes: https://tracker.ceph.com/issues/66104
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e