Read-repair re-adds objects that should have been deleted #4945

etiennedi · 2024-05-16T10:25:51Z

How to reproduce this bug?

All commands assume the use of this reproduction script.

Setup a three-node cluster locally (the script assumes the HTTP ports are 8080,8081,8082 and the gRPC ports are 50051, 50052, 50053 respectively)
Import data using python3 read_repair_bug.py import
- You can verify that the data was imported correctly using python3 read_repair_bug.py query.
- This should show 10 objects on each node
Kill node 2 or 3 (I recommend not to kill node 1 because the local scripts don't like when the "root" memberlist node dies)
Run a batch delete using python3 read_repair_bug.py delete
Restart the dead node
Verify that the nodes are now out of sync using python3 read_repair_bug.py query
- You should see 6 objects on the healthy nodes, but 10 objects on the node that missed an update
Query with consistency level ALL using python3 read_repair_bug.py query --consistency-level ALL
- Note: It may take more than one iteration for the bug to show up. I had to run this command 3 times in my last attempt.
- EDIT: This step may depend on timing. Right now it seemed as I needed to wait ~60s until the repair messed things up. If this is correct, this could mean that it's related to flushing memtables, as idle memtables would be flushed about 60s later.

What is the expected behavior?

The node that missed the update is being repaired and eventually all nodes shows 6 objects.

What is the actual behavior?

Instead of replicating the delete we seem to replicate the inconsistent behavior from the out-of-sync node and up with 10 objects on all nodes. In other words, the objects that should have been deleted were incorrectly recreated.

Supporting information

No response

Server Version

So far only tested on v1.23.9. Will test more versions.

Code of Conduct

I have read and agree to the Weaviate's Contributor Guide and Code of Conduct

The text was updated successfully, but these errors were encountered:

etiennedi added the bug label May 16, 2024

reyreaud-l added type-bug and removed bug labels May 16, 2024

reyreaud-l assigned reyreaud-l and jeroiraz May 16, 2024

reyreaud-l added bug and removed type-bug labels May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read-repair re-adds objects that should have been deleted #4945

Read-repair re-adds objects that should have been deleted #4945

etiennedi commented May 16, 2024 •

edited

Read-repair re-adds objects that should have been deleted #4945

Read-repair re-adds objects that should have been deleted #4945

Comments

etiennedi commented May 16, 2024 • edited

How to reproduce this bug?

What is the expected behavior?

What is the actual behavior?

Supporting information

Server Version

Code of Conduct

etiennedi commented May 16, 2024 •

edited