-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: relax check in AutomaticLeadershipBalancingTest #18497
base: dev
Are you sure you want to change the base?
Conversation
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49162#018f7d17-82bc-48cf-b433-0c9851414504 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49162#018f7d1f-a86b-476f-af6b-18eede55586a ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49162#018f7d1f-a86e-4f3a-bf44-258891b862ca ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49228#018f8226-1f81-40f2-8711-9c4910f8634f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49396#018f9daf-68ec-4efa-96e8-a7e7a72f486f |
for s, count in shard2leaders.items(): | ||
expected_min = math.floor(expected_on_shard * 0.8) | ||
# Check with a lot of slack because leader balancer may not be able to achieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if we should mark this as ok_to_fail instead, so we don't lose track of tightening the check once the underlying issue is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if it is marked ok_to_fail we will surely lose track :) and I don't think we can mark individual assertions ok_to_fail... Also even in this form the check is somewhat useful
Just restarted nodes may have their health reports incomplete because not all partitions have started yet. Also right after restart the node is probably busy catching up and replicating data that was produced in its absense. Because of these two reasons just restarted nodes are bad candidates for leadership transfers, mute them.
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8226-1f86-4639-be96-7e0e03f9e76b:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8215-c581-409c-b052-78e58d78c3c4:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8226-1f81-40f2-8711-9c4910f8634f:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8226-1f89-46ed-b36f-d8f40d5f346a:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8215-c585-4e68-89d5-245de545bb40:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8215-c583-4ebe-a3d7-96108f6a4b42:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8215-c57e-4a43-97a7-8423e98bb3c6:
new failures in https://buildkite.com/redpanda/redpanda/builds/49228#018f8300-c364-4b33-9d07-7b67d4bb629d:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9da7-7fbe-433f-9aa9-bf906ba7c3c4:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9da7-7fbc-46e2-94ab-2fea83f5e43d:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9da7-7fc2-4294-9018-8ace8d69812e:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9daf-68e3-4246-a9f1-0e9ff37873de:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9daf-68e6-4789-bc86-964fc89cb749:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9daf-68e9-47b9-9923-9e87cf7f343c:
new failures in https://buildkite.com/redpanda/redpanda/builds/49396#018f9daf-68ec-4efa-96e8-a7e7a72f486f:
|
@ztlpn is this ready for review? Lots of failures, so unsure if they are related or not. |
@bharathv they are related, though this is more of a test problem. Currently discussing with the storage team how to fix the test. |
merged #18603, retrying ci... |
/ci-repeat |
Relax the shard leader count check because leader balancer may not be able to achieve balanced counts due to interplay between topic-aware and total counts objectives (see https://github.com/redpanda-data/core-internal/issues/1282).
Fixes #17150
Also mute just restarted nodes in leader_balancer, as their health reports can have incomplete partition info, and they are probably busy recovering partitions anyway.
Backports Required
Release Notes
Improvements