You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a race condition in the internal schema implementation when using RAFT on 1.25+ versions. The problem is that we're holding a RLock on a RWMutex and recursive lock and try to re-acquire a RLock on the same mutex. This is a recipe for a deadlock and should be removed.
Full details of how it can happen in cluster/meta_class.go
We successfully obtain an RLock in line 38
We will not release that RLock until we have finished execution
In line 40, we jump to line 60 (MultiTenancyConfig)
Let’s say right now someone else tries to obtain a Lock() (I can prove in the callstack that we have routines waiting for a Lock().). This request has two effects:
It needs to wait until all current readers are completed, therefore it is itself blocking
Because the RWMutex is write-preferring, any new RLock calls will have to wait for this Lock() call to finish
We move on to line 64 where we try to obtain another RLock
We are now deadlocked. It’s impossible to proceed: The other routine waiting for a Lock() cannot proceed because we are still holding an RLock() that was granted before. We cannot proceed because we are trying to get (another) RLock() after a Lock() request came in.
There is a race condition in the internal schema implementation when using RAFT on 1.25+ versions. The problem is that we're holding a RLock on a RWMutex and recursive lock and try to re-acquire a RLock on the same mutex. This is a recipe for a deadlock and should be removed.
Full details of how it can happen in
cluster/meta_class.go
Reported by: @etiennedi
The text was updated successfully, but these errors were encountered: