Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events being lost in worker mode #7223

Open
mmoorfield opened this issue May 3, 2024 · 4 comments
Open

Events being lost in worker mode #7223

mmoorfield opened this issue May 3, 2024 · 4 comments

Comments

@mmoorfield
Copy link

Describe the bug

When a new medusa instance is started in worker mode, it is consuming messages that were already on redis before the subscribers are initialized.

The start order of server + worker instances is important in order to ensure that the worker is active before new events arrive. This is problematic as it means that auto-scaling of workers is not really possible without introducing the potential to lose events.

System information

Medusa version (including plugins): 1.20.4
Node.js version: 18
Database: Postgres
Operating system: Linux and Mac
Browser (if relevant): N/A
Event Bus: Redis

Steps to reproduce the behavior

  1. Start a medusa instance in worker_mode: server
  2. Submit a new order (or any other activity that triggers an event to the redis event bus)
  3. Start a second medusa instance in worker_mode: worker
  4. Observe the logs and notice that the event is being processed before the subscribers are initialized and the orders are effectively lost
✔ Models initialized – 19ms
✔ Plugin models initialized – 11ms
✔ Strategies initialized – 18ms
✔ Database initialized – 56ms
✔ Repositories initialized – 22ms
✔ Services initialized – 6ms
⠋ Initializing modules
info:    Connection to Redis in module 'event-bus-redis' established
info:    Connection to Redis in module 'cache-redis' established
info:    Processing cart.created which has 0 subscribers
info:    Processing cart.updated which has 0 subscribers
info:    Processing cart.updated which has 0 subscribers
info:    Processing cart.updated which has 0 subscribers
info:    Processing payment.updated which has 0 subscribers
info:    Processing order.placed which has 0 subscribers
info:    Processing cart.updated which has 0 subscribers
✔ Modules initialized – 90ms
✔ Express intialized – 1ms
✔ Plugins intialized – 554ms
✔ Subscribers initialized – 5ms
✔ API initialized – 29ms
⠋ Initializing defaults
✔ Defaults initialized – 67ms
⠋ Initializing search engine indexing
✔ Indexing event emitted – 3ms
✔ Server is ready on port: 9000 – 19ms

Expected behavior

Expect the ability to subscribe to events by multiple worker instances and for all custom subscribers to be initialised before any workers start processing.

@olivermrbl
Copy link
Contributor

@mmoorfield, thanks for submitting the issue.

We hadn't thought about this scenario, but you are right. This can indeed happen with the current way our subscribers are loaded, which is after the worker starts processing events.

The solution to this is likely not a quick fix – is this something you need urgently?

I have some ideas about how we can solve this, but those are more comprehensive changes, e.g. introducing a new application life cycle hook, that gets executed after the entire application has started. Here, we would be able to tell the worker to start processing without worrying about whether subscribers are registered or not.

@mmoorfield
Copy link
Author

Thanks @olivermrbl - It's quite a significant problem for us as we deploy the API and Workers separately on AWS ECS (Fargate) with auto-scaling of the containers. It's a great improvement to the scalability of the stack.

But with this new approach we can't guarantee the start order of things and under high load introducing a new container to the environment does have the potential to consume events with 0 subscribers registered and lose events.

We have just implemented a temporary workaround currently by making use of the BullMQ worker autrun = false config option which will not kick off the workers automatically on creation. We then made a modification to the redis event bus module to add the ability to start the worker explicitly using bullWorker_.run() which we then invoke via a custom API route.

In our docker start script we wait for the worker instance to fully start before invoking this custom API route.

It works but is not ideal to fork the event bus module. Welcome any other ideas you have.

@olivermrbl
Copy link
Contributor

But with this new approach we can't guarantee the start order of things and under high load introducing a new container to the environment does have the potential to consume events with 0 subscribers registered and lose events.

Can I get you to elaborate on this? At first glance, with a life cycle hook, we are guaranteed that a specific instance won't pick up events before subscribers are registered.

@mmoorfield
Copy link
Author

Sorry for the confusion. Your proposed approach of a life cycle hook is a good one and would address this.

I was referring to the new approach of running medusa instances in server and worker mode separately where we can't control the start sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants