Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisCluster becomes unrecoverable if all nodes timeout #3221

Open
kuza55 opened this issue May 2, 2024 · 1 comment
Open

RedisCluster becomes unrecoverable if all nodes timeout #3221

kuza55 opened this issue May 2, 2024 · 1 comment

Comments

@kuza55
Copy link

kuza55 commented May 2, 2024

Version: 5.1.2

Platform: Ubuntu 22.04

Description:

RedisCluster becomes unrecoverable and crashes if all the nodes timeout at the same time. If you have a RedisCluster with 1 node, then this is particularly likely.

The crash that happens is:

Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 573, in use_span
    yield span
  File "/app/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1046, in start_as_current_span
    yield span
  File "/app/lib/python3.11/site-packages/opentelemetry/instrumentation/redis/__init__.py", line 263, in _async_traced_execute_command
    response = await func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 721, in execute_command
    await self.initialize()
  File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 419, in initialize
    await self.nodes_manager.initialize()
  File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 1347, in initialize
    raise RedisClusterException(
redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node: None

I think this is because of this line where the node is removed, expecting that we will connect to another node and recover the cluster instances from there:

self.nodes_manager.startup_nodes.pop(target_node.name, None)

This bug seems similar to, but distinct from #3130

Also seems related to #2472

@kuza55 kuza55 changed the title RedisCluster becomes unrecoverable if al nodes timeout RedisCluster becomes unrecoverable if all nodes timeout May 2, 2024
@julianogv
Copy link

julianogv commented May 8, 2024

I'm having the same issue here.

A simple method to reproduce it is to connect to a redis cluster through the internet (AWS Elasticache for example) and then turn your wifi/ethernet off and then enable it again, the error won't stop and it will raise RedisClusterException in a infinite loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants