Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAFT Deadlock on race condition on multi tenancy config read #4957

Open
reyreaud-l opened this issue May 17, 2024 · 0 comments · Fixed by #4959
Open

RAFT Deadlock on race condition on multi tenancy config read #4957

reyreaud-l opened this issue May 17, 2024 · 0 comments · Fixed by #4959
Assignees
Labels

Comments

@reyreaud-l
Copy link
Contributor

reyreaud-l commented May 17, 2024

There is a race condition in the internal schema implementation when using RAFT on 1.25+ versions. The problem is that we're holding a RLock on a RWMutex and recursive lock and try to re-acquire a RLock on the same mutex. This is a recipe for a deadlock and should be removed.

Full details of how it can happen in cluster/meta_class.go

  1. We successfully obtain an RLock in line 38
  2. We will not release that RLock until we have finished execution
  3. In line 40, we jump to line 60 (MultiTenancyConfig)
  4. Let’s say right now someone else tries to obtain a Lock() (I can prove in the callstack that we have routines waiting for a Lock().). This request has two effects:
    • It needs to wait until all current readers are completed, therefore it is itself blocking
    • Because the RWMutex is write-preferring, any new RLock calls will have to wait for this Lock() call to finish
  5. We move on to line 64 where we try to obtain another RLock
  6. We are now deadlocked. It’s impossible to proceed: The other routine waiting for a Lock() cannot proceed because we are still holding an RLock() that was granted before. We cannot proceed because we are trying to get (another) RLock() after a Lock() request came in.

Reported by: @etiennedi

@reyreaud-l reyreaud-l self-assigned this May 17, 2024
@reyreaud-l reyreaud-l linked a pull request May 17, 2024 that will close this issue
4 tasks
@reyreaud-l reyreaud-l added bug and removed type-bug labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants