Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancelled jobs on CI are still not handled correctly #4997

Open
huydhn opened this issue Mar 11, 2024 · 6 comments
Open

Cancelled jobs on CI are still not handled correctly #4997

huydhn opened this issue Mar 11, 2024 · 6 comments

Comments

@huydhn
Copy link
Contributor

huydhn commented Mar 11, 2024

There are feedback that cancelled signals on CI are still showing up. This causes confusion and also blocks merge. For example, pytorch/pytorch#121522 (comment)

Looking at the workflow summary https://github.com/pytorch/pytorch/actions/runs/8207746512, it's clear that the workflow was cancelled by its concurrency rule:

Canceling since a higher priority waiting request for 'linux-binary-libtorch-pre-cxx11-ciflow/trunk/121522-false-false' exists

If we can query this information, it should be a reliable way to handle cancelled signals on CI.

@huydhn
Copy link
Contributor Author

huydhn commented Mar 12, 2024

AI: Cancelled signals should be surface in clear way 1) if the job is cancel by user, we should tell that the merge is cancel 2) if the job is cancel because a higher priority job runs, it shouldn't show up as failures

@ZainRizvi
Copy link
Contributor

To clarify the ask, would it be correct to say we want:

  1. Jobs cancelled by the user should be marked as "cancelled by user" by Dr. CI and mergebot
  2. Jobs cancelled due to infra reasons (like higher priority jobs) should be marked as "cancelled by infra" in Dr. CI and mergebot
  3. For both of the above, mergebot should continue failing the merge, but give the more precise about why it's blocking the merge

@huydhn
Copy link
Contributor Author

huydhn commented Apr 5, 2024

The first point is correct, when the jobs are cancelled by the user, we want them to show up as failures and block merge. However, if the jobs are canceled by a higher priority request like the above example, we don't want them to shown up on Dr.CI though. Instead, we need to use the status of the newer set of jobs instead.

@ZainRizvi
Copy link
Contributor

IIRC if we just show cancelled jobs as cancelled, when the new job kicks off we'd automatically show the status of of the now-running job, right?

@ZainRizvi
Copy link
Contributor

Related Issue: #4644

@huydhn
Copy link
Contributor Author

huydhn commented Apr 6, 2024

IIRC if we just show cancelled jobs as cancelled, when the new job kicks off we'd automatically show the status of of the now-running job, right?

Yeah, that's what I think too. This issue is kind of hard to track and reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Cold Storage
Development

No branches or pull requests

2 participants