Allow skipping a workflow if mutex cannot be acquired #13055

nettrino · 2024-05-15T06:49:38Z

Summary

If a mutex is at the workflow level, other workflows with the same mutex will wait till the mutex is acquired. However, there would be cases where we would want to skip a workflow entirely if another instance of the workflow has the mutex. This could be doable via some configuration option like the following:

                synchronization:
                    mutex:
                        name: test
                    acquireFailurePolicy: skipWorkflow

We could expose different types of policies like skipWorkflow, wait, etc.

Use Cases

There are cases were a workflow runs to ingest / update some data set, for instance fetch new rows from a database for a given entity. Running two of the same workflows will have no meaning as they will fetch the exact same data, thus a mutex prevents this duplication, however currently it does not prevent duplicating work back-to-back without custom logic within the workflow executable. A policy that would determine what to do if a lock is not acquired would allow skipping the secondary (redundant) workflow entirely.

Beyond this issue, one could allow more complex blocking policies as well, where a mutex is considered to be "held" for longer periods of time than a workflow is running for. For instance, say I ingest data that get updated every 12h from some remote source. If a single one of the workflows has been run within that time window, we could force the mutex to be held for 12h, even if the workflow has been completed, so as no other workflows can run and do redundant work within that time period. This would be useful because, even if skipWorkflow is accepted as a feature request, one could schedule redundant workflows to be running one after the other without lock contention.

Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

The text was updated successfully, but these errors were encountered:

agilgur5 · 2024-05-15T16:11:27Z

Sounds like an extension / more concrete proposal of #10396

And a variant of #12757

Regarding the specific spec -- fields are camelCase by convention in k8s, so I've edited your proposal to match that.

Joibel · 2024-05-17T12:35:36Z

As semaphores and mutexes can be used at a finer grained level than a whole Workflow, I'd suggest the policy should be just called skip, but with the implementation implied by skipTemplate.

To implement #12757 (which is mine, so I do have an interest here) you'd need a steal policy, which is problematic to implement for semaphores, as which one do you steal - I guess you could have stealOldest or stealNewest. We could just disallow steal on semaphores, and only allow it on mutex. Do we implement stealWithStop separately from stealWithTerminate?

It's somewhat preferable to implement this mechanism over #12757's proposal because it's more powerful. It's also much less intuitive for the simple case that that issue is aiming for and requires more typing.

agilgur5 · 2024-05-17T18:49:14Z

It's somewhat preferable to implement this mechanism over #12757's proposal because it's more powerful. It's also much less intuitive for the simple case that that issue is aiming for and requires more typing.

Yea, I was thinking about that, but as concurrencyPolicy already exists elsewhere, I was thinking maybe we just add more verbs to the existing enum? Or would that not work naming-wise?

As semaphores and mutexes can be used at a finer grained level than a whole Workflow, I'd suggest the policy should be just called skip, but with the implementation implied by skipTemplate.

Agree

you'd need a steal policy

We should use a different word to avoid negative connotations. take is more neutral. But Replace from concurrencyPolicy might be better as it uses existing terminology, no?

nettrino · 2024-05-27T07:09:20Z

I'd suggest the policy should be just called skip

+1 on the above definitely cleaner.

Re:

Beyond this issue, one could allow more complex blocking policies as well, where a mutex is considered to be "held" for longer periods of time than a workflow is running for. For instance, say I ingest data that get updated every 12h from some remote source. If a single one of the workflows has been run within that time window, we could force the mutex to be held for 12h, even if the workflow has been completed, so as no other workflows can run and do redundant work within that time period. This would be useful because, even if skipWorkflow is accepted as a feature request, one could schedule redundant workflows to be running one after the other without lock contention.

do you think that warrants a separate ticket entirely? Or would it be something that could be added as part of the policy implementation? (e.g., for a lock to be held longer than the mere execution time of the workflow)

nettrino added the type/feature Feature request label May 15, 2024

agilgur5 added area/mutex-semaphore area/spec Changes to the workflow specification. labels May 15, 2024

This was referenced May 15, 2024

Feature Request: Skip blocked Workflow execution #10396

Closed

proposal: workflow concurrency control #12757

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow skipping a workflow if mutex cannot be acquired #13055

Allow skipping a workflow if mutex cannot be acquired #13055

nettrino commented May 15, 2024 •

edited by agilgur5

agilgur5 commented May 15, 2024 •

edited

Joibel commented May 17, 2024

agilgur5 commented May 17, 2024

nettrino commented May 27, 2024

Allow skipping a workflow if mutex cannot be acquired #13055

Allow skipping a workflow if mutex cannot be acquired #13055

Comments

nettrino commented May 15, 2024 • edited by agilgur5

Summary

Use Cases

agilgur5 commented May 15, 2024 • edited

Joibel commented May 17, 2024

agilgur5 commented May 17, 2024

nettrino commented May 27, 2024

nettrino commented May 15, 2024 •

edited by agilgur5

agilgur5 commented May 15, 2024 •

edited