RAFT Deadlock on race condition on multi tenancy config read #4957

reyreaud-l · 2024-05-17T06:51:55Z

There is a race condition in the internal schema implementation when using RAFT on 1.25+ versions. The problem is that we're holding a RLock on a RWMutex and recursive lock and try to re-acquire a RLock on the same mutex. This is a recipe for a deadlock and should be removed.

Full details of how it can happen in cluster/meta_class.go

We successfully obtain an RLock in line 38
We will not release that RLock until we have finished execution
In line 40, we jump to line 60 (MultiTenancyConfig)
Let’s say right now someone else tries to obtain a Lock() (I can prove in the callstack that we have routines waiting for a Lock().). This request has two effects:
- It needs to wait until all current readers are completed, therefore it is itself blocking
- Because the RWMutex is write-preferring, any new RLock calls will have to wait for this Lock() call to finish
We move on to line 64 where we try to obtain another RLock
We are now deadlocked. It’s impossible to proceed: The other routine waiting for a Lock() cannot proceed because we are still holding an RLock() that was granted before. We cannot proceed because we are trying to get (another) RLock() after a Lock() request came in.

Reported by: @etiennedi

The text was updated successfully, but these errors were encountered:

reyreaud-l self-assigned this May 17, 2024

reyreaud-l added the type-bug label May 17, 2024

reyreaud-l assigned etiennedi May 17, 2024

reyreaud-l linked a pull request May 17, 2024 that will close this issue

fix deadlock by avoiding recursive locking in meta_class #4959

Merged

4 tasks

reyreaud-l added bug and removed type-bug labels May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAFT Deadlock on race condition on multi tenancy config read #4957

RAFT Deadlock on race condition on multi tenancy config read #4957

reyreaud-l commented May 17, 2024 •

edited

RAFT Deadlock on race condition on multi tenancy config read #4957

RAFT Deadlock on race condition on multi tenancy config read #4957

Comments

reyreaud-l commented May 17, 2024 • edited

reyreaud-l commented May 17, 2024 •

edited