Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: rebalance ranges with one voter using joint configurations #124284

Merged
merged 2 commits into from
May 20, 2024

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented May 16, 2024

The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to #74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##

After:

  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####

Fixes: #108420
Fixes: #124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.

@kvoli kvoli self-assigned this May 16, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli changed the title asim: add one voter lease thrashing reproduction kvserver: rebalance ranges with one voter using joint configurations May 16, 2024
@kvoli kvoli force-pushed the 240516.joint-config-rebalance-rf1 branch 2 times, most recently from 9bb5dff to 0a12f1c Compare May 16, 2024 17:02
@kvoli kvoli added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. labels May 16, 2024
@kvoli kvoli force-pushed the 240516.joint-config-rebalance-rf1 branch 2 times, most recently from d3edeaa to e6eaca8 Compare May 16, 2024 18:01
@kvoli
Copy link
Collaborator Author

kvoli commented May 16, 2024

The existing leaseholder store is never selected as the rebalance target. That is to be expected initially, as scatter only jitters replica count stats, it doesn't completely randomize them. The leaseholder store has 1k+ replicas. However, after the replica counts get closer, we'd expect scatter to also do nothing 1/n of the time but it is still continuing to only choose other stores with this patch.

@kvoli kvoli force-pushed the 240516.joint-config-rebalance-rf1 branch from e6eaca8 to 79f3177 Compare May 16, 2024 20:20
Add a simulation test where the replication factor is set to 1. This
highlights lease thrashing, which causes replica and lease count
convergence to take upwards of 20 minutes with two nodes.

Informs: cockroachdb#108420
Release note: None
@kvoli kvoli force-pushed the 240516.joint-config-rebalance-rf1 branch 2 times, most recently from db23e58 to 0ecb487 Compare May 20, 2024 17:36
@kvoli
Copy link
Collaborator Author

kvoli commented May 20, 2024

The existing leaseholder store is never selected as the rebalance target. That is to be expected initially, as scatter only jitters replica count stats, it doesn't completely randomize them. The leaseholder store has 1k+ replicas. However, after the replica counts get closer, we'd expect scatter to also do nothing 1/n of the time but it is still continuing to only choose other stores with this patch.

Turns out this is just due to timing. The allocator rebalance targets are returned before any rebalancing activity occurs. The disparity in replica counts initially causes the initial node to not be selected. If we iteratively ran scatter, the results would be closer to expectation w/ an even distribution. The lease/replicate queue will also adjust lease/replica counts as the scatter occurs.

@kvoli kvoli force-pushed the 240516.joint-config-rebalance-rf1 branch from 0ecb487 to 06635fc Compare May 20, 2024 18:47
@kvoli kvoli requested a review from nvanbenschoten May 20, 2024 19:28
@kvoli kvoli marked this pull request as ready for review May 20, 2024 19:28
@kvoli kvoli requested a review from a team as a code owner May 20, 2024 19:28
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r5, 6 of 7 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/asim/tests/testdata/non_rand/one_voter line 0 at r7 (raw file):
🔥


pkg/kv/kvserver/allocator/plan/util_test.go line 56 at r7 (raw file):

				{ChangeType: roachpb.REMOVE_VOTER, Target: roachpb.ReplicationTarget{NodeID: 1, StoreID: 1}},
			},
			expectedErrStr: "",

nit: expectedPerformingSwap: false,


pkg/sql/scatter_test.go line 215 at r7 (raw file):

		t, tc.ServerConn(0), "t",
		"k INT PRIMARY KEY, v INT",
		500,

nit: /* numRows */


pkg/sql/scatter_test.go line 262 at r7 (raw file):

	}

	// Expect that the number of replicas on store 1 to have changed. We can't

It feels like we're leaving test flakiness up to chance here, but I guess (1/3)^50 is a small enough number that we don't think it's at risk of flaking.

The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/asim/tests/testdata/non_rand/one_voter line at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

🔥

Thanks 😄


pkg/kv/kvserver/allocator/plan/util_test.go line 56 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: expectedPerformingSwap: false,

Updated.


pkg/sql/scatter_test.go line 215 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: /* numRows */

Updated.


pkg/sql/scatter_test.go line 262 at r7 (raw file):
Seems reasonably small, in addition to:

Turns out this is just due to timing. The allocator rebalance targets are returned before any rebalancing activity occurs. The disparity in replica counts initially causes the initial node to not be selected. If we iteratively ran scatter, the results would be closer to expectation w/ an even distribution. The lease/replicate queue will also adjust lease/replica counts as the scatter occurs.

@kvoli kvoli force-pushed the 240516.joint-config-rebalance-rf1 branch from 06635fc to 1995005 Compare May 20, 2024 20:53
@kvoli
Copy link
Collaborator Author

kvoli commented May 20, 2024

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request May 20, 2024
122864: sql: add randomized test for plan-gist logic r=yuzefovich a=yuzefovich

This commit adds a randomized test for plan-gist logic. It utilizes sqlsmith to generate random stmts, then runs EXPLAIN (GIST) for it to retrieve the gist followed by decoding that gist, and ensures that no internal errors occur. All generated stmts are also executed with a short timeout to allow for DB state to evolve. Additionally, all gists are accumulated and decoded against an empty DB.

The test is placed in the ccl package to exercise features gated behind CCL license.

The test quickly discovered bug fixed in 1315dda (when the corresponding fix is reverted) but haven't found anything new for plan-gist logic (but did hit a few issues elsewhere which are currently ignored in the test).

Fixes: #117634.

Release note: None

124284: kvserver: rebalance ranges with one voter using joint configurations  r=nvanbenschoten a=kvoli

The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to #74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: #108420
Fixes: #124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.

124294: sql/lexbase: remove map access overhead when encoding SQL strings r=mgartner a=mgartner

#### opt: add benchmark for simple INSERT with JSON column

Release note: None

#### sql/lexbase: remove map access overhead when encoding SQL strings

Map accesses have been replaced with more efficient array accesses for
determining characters to quote in `EncodeSQLStringWithFlags`. In a CPU
profile from a recent POC these map access accounted for a significant
portion of CPU time spent in query optimization because the optimizer's
interner converts `tree.DJSON` objects to strings using this encoding
function when hashing and checking equality of JSON constants.

Epic: None

Release note: None


124444: packer: move `add-apt-repository` call to after `sleep` r=jlinder a=rickystewart

Updates are still going on on the VM while this is happening.

Epic: none
Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 20, 2024

Build failed (retrying...):

@craig craig bot merged commit e96ffcb into cockroachdb:master May 20, 2024
22 checks passed
Copy link

blathers-crl bot commented May 20, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 1995005 to blathers/backport-release-23.1-124284: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 1995005 to blathers/backport-release-23.2-124284: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1.
Projects
None yet
3 participants