Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow skipping a workflow if mutex cannot be acquired #13055

Open
nettrino opened this issue May 15, 2024 · 4 comments
Open

Allow skipping a workflow if mutex cannot be acquired #13055

nettrino opened this issue May 15, 2024 · 4 comments
Labels
area/mutex-semaphore area/spec Changes to the workflow specification. type/feature Feature request

Comments

@nettrino
Copy link

nettrino commented May 15, 2024

Summary

If a mutex is at the workflow level, other workflows with the same mutex will wait till the mutex is acquired. However, there would be cases where we would want to skip a workflow entirely if another instance of the workflow has the mutex. This could be doable via some configuration option like the following:

                synchronization:
                    mutex:
                        name: test
                    acquireFailurePolicy: skipWorkflow

We could expose different types of policies like skipWorkflow, wait, etc.

Use Cases

There are cases were a workflow runs to ingest / update some data set, for instance fetch new rows from a database for a given entity. Running two of the same workflows will have no meaning as they will fetch the exact same data, thus a mutex prevents this duplication, however currently it does not prevent duplicating work back-to-back without custom logic within the workflow executable. A policy that would determine what to do if a lock is not acquired would allow skipping the secondary (redundant) workflow entirely.

Beyond this issue, one could allow more complex blocking policies as well, where a mutex is considered to be "held" for longer periods of time than a workflow is running for. For instance, say I ingest data that get updated every 12h from some remote source. If a single one of the workflows has been run within that time window, we could force the mutex to be held for 12h, even if the workflow has been completed, so as no other workflows can run and do redundant work within that time period. This would be useful because, even if skipWorkflow is accepted as a feature request, one could schedule redundant workflows to be running one after the other without lock contention.


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

@nettrino nettrino added the type/feature Feature request label May 15, 2024
@agilgur5 agilgur5 added area/mutex-semaphore area/spec Changes to the workflow specification. labels May 15, 2024
@agilgur5
Copy link
Member

agilgur5 commented May 15, 2024

Sounds like an extension / more concrete proposal of #10396

And a variant of #12757

Regarding the specific spec -- fields are camelCase by convention in k8s, so I've edited your proposal to match that.

@Joibel
Copy link
Member

Joibel commented May 17, 2024

As semaphores and mutexes can be used at a finer grained level than a whole Workflow, I'd suggest the policy should be just called skip, but with the implementation implied by skipTemplate.

To implement #12757 (which is mine, so I do have an interest here) you'd need a steal policy, which is problematic to implement for semaphores, as which one do you steal - I guess you could have stealOldest or stealNewest. We could just disallow steal on semaphores, and only allow it on mutex. Do we implement stealWithStop separately from stealWithTerminate?

It's somewhat preferable to implement this mechanism over #12757's proposal because it's more powerful. It's also much less intuitive for the simple case that that issue is aiming for and requires more typing.

@agilgur5
Copy link
Member

It's somewhat preferable to implement this mechanism over #12757's proposal because it's more powerful. It's also much less intuitive for the simple case that that issue is aiming for and requires more typing.

Yea, I was thinking about that, but as concurrencyPolicy already exists elsewhere, I was thinking maybe we just add more verbs to the existing enum? Or would that not work naming-wise?

As semaphores and mutexes can be used at a finer grained level than a whole Workflow, I'd suggest the policy should be just called skip, but with the implementation implied by skipTemplate.

Agree

you'd need a steal policy

We should use a different word to avoid negative connotations. take is more neutral. But Replace from concurrencyPolicy might be better as it uses existing terminology, no?

@nettrino
Copy link
Author

I'd suggest the policy should be just called skip

+1 on the above definitely cleaner.

Re:

Beyond this issue, one could allow more complex blocking policies as well, where a mutex is considered to be "held" for longer periods of time than a workflow is running for. For instance, say I ingest data that get updated every 12h from some remote source. If a single one of the workflows has been run within that time window, we could force the mutex to be held for 12h, even if the workflow has been completed, so as no other workflows can run and do redundant work within that time period. This would be useful because, even if skipWorkflow is accepted as a feature request, one could schedule redundant workflows to be running one after the other without lock contention.

do you think that warrants a separate ticket entirely? Or would it be something that could be added as part of the policy implementation? (e.g., for a lock to be held longer than the mere execution time of the workflow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mutex-semaphore area/spec Changes to the workflow specification. type/feature Feature request
Projects
None yet
Development

No branches or pull requests

3 participants