Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nova Linux build job runs as root inside its Docker container #5091

Open
huydhn opened this issue Apr 12, 2024 · 4 comments
Open

Nova Linux build job runs as root inside its Docker container #5091

huydhn opened this issue Apr 12, 2024 · 4 comments
Labels
bug Something isn't working triage review

Comments

@huydhn
Copy link
Contributor

huydhn commented Apr 12, 2024

The container in question https://hub.docker.com/r/pytorch/manylinux-builder/tags.

There are reports from ExecuTorch and Torchtune about the problematic way of running Nova Linux build job as root inside the container.

  1. On ExecuTorch, buck2 refuses to run as root https://github.com/pytorch/executorch/actions/runs/8655199802/job/23733768589#step:14:125, which blocks ET from using the workflow. For the context, ET use buck2 to gather the source files before passing them to cmake.
  2. On TorchTune, running the build as root leaves some artifacts like the conda folder on the runner, which could prevent subsequent jobs to clean it up as @clee2000 discovered it https://github.com/pytorch/torchtune/actions/runs/8639310064/job/23685409883?pr=688.

Usually, running thing as root is bad, and we should figure out a way to not do it anymore

cc @seemethere @atalman @kit1980 @clee2000 @dbort @malfet

@huydhn huydhn added bug Something isn't working high priority labels Apr 12, 2024
@clee2000
Copy link
Contributor

clee2000 commented Apr 16, 2024

cc @atalman for NOVA workflows

@malfet: This sort of runner clean up shoud be done by the runner and not in the workflows, ARC should be able to do this cc @DanilBaibak @jeanschmidt

@huydhn
Copy link
Contributor Author

huydhn commented Apr 30, 2024

AI: To figure out a POC going forward for Nova workflow.

@huydhn
Copy link
Contributor Author

huydhn commented Apr 30, 2024

cc @atalman

@dbort
Copy link
Contributor

dbort commented May 3, 2024

pytorch/executorch#3502 is caused by the hack I added to executorch to work around this "running as root" issue: buck2 does not like running as root.

dbort added a commit to dbort/executorch-1 that referenced this issue May 3, 2024
This hack is required to work around pytorch/test-infra#5091, which
runs some CI jobs as root, which buck2 doesn't like.

But we saw in pytorch#3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually
running as root.
facebook-github-bot pushed a commit to pytorch/executorch that referenced this issue May 4, 2024
Summary:
This hack is required to work around pytorch/test-infra#5091, which runs some CI jobs as root, which buck2 doesn't like.

But we saw in #3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually running as root.

Mitigates #3502

Pull Request resolved: #3507

Test Plan:
`./install_requirements.sh` succeeded locally.

The build-wheels jobs for this PR do not break during the buck2 phase, and show that they're unsetting HOME.

https://github.com/pytorch/executorch/actions/runs/8946028682/job/24575999986?pr=3507#step:14:118
```
2024-05-03T23:32:08.9616508Z temporarily unsetting HOME while running as root
```

https://github.com/pytorch/executorch/actions/runs/8946028682/job/24575999986?pr=3507#step:14:465
```
2024-05-03T23:32:08.9914557Z restored HOME
```

Reviewed By: larryliu0820

Differential Revision: D56958571

Pulled By: dbort

fbshipit-source-id: c7c6abdd52361af8253ce068002e3c23dee16f6b
dbort added a commit to dbort/executorch-1 that referenced this issue May 4, 2024
This hack is required to work around pytorch/test-infra#5091, which
runs some CI jobs as root, which buck2 doesn't like.

But we saw in pytorch#3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually
running as root.
mcr229 pushed a commit to pytorch/executorch that referenced this issue May 21, 2024
This hack is required to work around pytorch/test-infra#5091, which
runs some CI jobs as root, which buck2 doesn't like.

But we saw in #3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually
running as root.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage review
Projects
Status: Prioritized
Development

No branches or pull requests

3 participants