You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setup a three-node cluster locally (the script assumes the HTTP ports are 8080,8081,8082 and the gRPC ports are 50051, 50052, 50053 respectively)
Import data using python3 read_repair_bug.py import
You can verify that the data was imported correctly using python3 read_repair_bug.py query.
This should show 10 objects on each node
Kill node 2 or 3 (I recommend not to kill node 1 because the local scripts don't like when the "root" memberlist node dies)
Run a batch delete using python3 read_repair_bug.py delete
Restart the dead node
Verify that the nodes are now out of sync using python3 read_repair_bug.py query
You should see 6 objects on the healthy nodes, but 10 objects on the node that missed an update
Query with consistency level ALL using python3 read_repair_bug.py query --consistency-level ALL
Note: It may take more than one iteration for the bug to show up. I had to run this command 3 times in my last attempt.
EDIT: This step may depend on timing. Right now it seemed as I needed to wait ~60s until the repair messed things up. If this is correct, this could mean that it's related to flushing memtables, as idle memtables would be flushed about 60s later.
What is the expected behavior?
The node that missed the update is being repaired and eventually all nodes shows 6 objects.
What is the actual behavior?
Instead of replicating the delete we seem to replicate the inconsistent behavior from the out-of-sync node and up with 10 objects on all nodes. In other words, the objects that should have been deleted were incorrectly recreated.
Supporting information
No response
Server Version
So far only tested on v1.23.9. Will test more versions.
How to reproduce this bug?
All commands assume the use of this reproduction script.
python3 read_repair_bug.py import
python3 read_repair_bug.py query
.python3 read_repair_bug.py delete
python3 read_repair_bug.py query
python3 read_repair_bug.py query --consistency-level ALL
What is the expected behavior?
The node that missed the update is being repaired and eventually all nodes shows 6 objects.
What is the actual behavior?
Instead of replicating the delete we seem to replicate the inconsistent behavior from the out-of-sync node and up with 10 objects on all nodes. In other words, the objects that should have been deleted were incorrectly recreated.
Supporting information
No response
Server Version
So far only tested on
v1.23.9
. Will test more versions.Code of Conduct
The text was updated successfully, but these errors were encountered: