You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nova GPU based workflows are failing in Conda. Happening mostly with 12.4, however can be observed with 12.1 and 11.8.
The failure is flaky since it passes from time to time.
Issue starting the Container:
Status: Downloaded newer image for pytorch/conda-builder:cuda12.4
docker.io/pytorch/conda-builder:cuda12.4
/usr/bin/docker create --name 42fc3baf03494dc5b4f0bc0b1e8e1dc4_pytorchcondabuildercuda124_f46ca7 --label 9f63b4 --workdir /__w/audio/audio --network github_network_68d0125cf865468b98e48e08a98dd61d --gpus all -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/ec2-user/actions-runner/_work":"/__w" -v "/home/ec2-user/actions-runner/externals":"/__e":ro -v "/home/ec2-user/actions-runner/_work/_temp":"/__w/_temp" -v "/home/ec2-user/actions-runner/_work/_actions":"/__w/_actions" -v "/home/ec2-user/actions-runner/_work/_tool":"/__w/_tool" -v "/home/ec2-user/actions-runner/_work/_temp/_github_home":"/github/home" -v "/home/ec2-user/actions-runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" pytorch/conda-builder:cuda12.4 "-f" "/dev/null"
f9e4cf858076e4a6ba5faeaa81174b7d4398938049e3afb4add73fff065874d6
/usr/bin/docker start f9e4cf858076e4a6ba5faeaa81174b7d4398938049e3afb4add73fff065874d6
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.4, please update your driver to a newer version, or use an earlier cuda container: unknown
Error: failed to start containers: f9e4cf858076e4a6ba5faeaa81174b7d4398938049e3afb4add73fff065874d6
Error: Docker start fail with exit code 1
The text was updated successfully, but these errors were encountered:
atalman
changed the title
[NOVA] Flaky torchaudio and torchvision conda GPU builds are faling
[NOVA] Flaky torchaudio and torchvision conda GPU builds are failing during Initialize Containers step
May 9, 2024
atalman
changed the title
[NOVA] Flaky torchaudio and torchvision conda GPU builds are failing during Initialize Containers step
[NOVA] Flaky conda GPU builds during Initialize Containers step
May 9, 2024
Nova GPU based workflows are failing in Conda. Happening mostly with 12.4, however can be observed with 12.1 and 11.8.
The failure is flaky since it passes from time to time.
Issue starting the Container:
Audio: https://github.com/pytorch/audio/actions/runs/9016656951/job/24773679889
Vision: https://github.com/pytorch/vision/actions/runs/9001122121/job/24726698313
This is maybe related to changes in:
and
The text was updated successfully, but these errors were encountered: