Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpijob will stuck if LastReconcileTime is updated in 1 second #2118

Open
shadowdsp opened this issue May 17, 2024 · 0 comments
Open

mpijob will stuck if LastReconcileTime is updated in 1 second #2118

shadowdsp opened this issue May 17, 2024 · 0 comments

Comments

@shadowdsp
Copy link

My mpijob will stuck forever because SyncPodGroup error within 1 second.

For example:

  1. At 00:00:00.100 SyncPodGroup created the pod group, and get the pod group failed.
  2. At 00:00:00.200 SyncPodGroup try to update the pod group, but there is a confliction error, just as Operation cannot be fulfilled on ...
    1. Then the controller will set the LastReconcileTime at the same as step 1.
    2. Finally the controller will UpdateJobStatusInApiServer while the job spec is not changed, and will not trigger the next reconcile
@shadowdsp shadowdsp changed the title mpijob will not reconcile if LastReconcileTime is updated in 1 second mpijob will stuck if LastReconcileTime is updated in 1 second May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant