Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with 70B instruct #157

Closed
AdamMiltonBarker opened this issue Apr 26, 2024 · 5 comments
Closed

Issue with 70B instruct #157

AdamMiltonBarker opened this issue Apr 26, 2024 · 5 comments
Labels
needs-more-information Issue is not fully clear to be acted upon

Comments

@AdamMiltonBarker
Copy link

AdamMiltonBarker commented Apr 26, 2024

machine Standard NC96ads A100 v4 (96 vcpus, 880 GiB memory)

W0426 23:50:13.559000 140316710531456 torch/distributed/run.py:757]
W0426 23:50:13.559000 140316710531456 torch/distributed/run.py:757] *****************************************
W0426 23:50:13.559000 140316710531456 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0426 23:50:13.559000 140316710531456 torch/distributed/run.py:757] *****************************************
> initializing model parallel with size 8
> initializing ddp with size 1
> initializing pipeline with size 1
[rank6]: Traceback (most recent call last):
[rank6]:   File "/home/tmsisa/llama3/cognitech.py", line 83, in <module>
[rank6]:     fire.Fire(main)
[rank6]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
[rank6]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank6]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
[rank6]:     component, remaining_args = _CallAndUpdateTrace(
[rank6]:                                 ^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
[rank6]:     component = fn(*varargs, **kwargs)
[rank6]:                 ^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/home/tmsisa/llama3/cognitech.py", line 58, in main
[rank6]:     generator = Llama.build(
[rank6]:                 ^^^^^^^^^^^^
[rank6]:   File "/home/tmsisa/llama3/llama/generation.py", line 75, in build
[rank6]:     torch.cuda.set_device(local_rank)
[rank6]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank6]:     torch._C._cuda_setDevice(device)
[rank6]: RuntimeError: CUDA error: invalid device ordinal
[rank6]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank6]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank6]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/tmsisa/llama3/cognitech.py", line 83, in <module>
[rank4]:     fire.Fire(main)
[rank4]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
[rank4]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank4]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
[rank4]:     component, remaining_args = _CallAndUpdateTrace(
[rank4]:                                 ^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
[rank4]:     component = fn(*varargs, **kwargs)
[rank4]:                 ^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/tmsisa/llama3/cognitech.py", line 58, in main
[rank4]:     generator = Llama.build(
[rank4]:                 ^^^^^^^^^^^^
[rank4]:   File "/home/tmsisa/llama3/llama/generation.py", line 75, in build
[rank4]:     torch.cuda.set_device(local_rank)
[rank4]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank4]:     torch._C._cuda_setDevice(device)
[rank4]: RuntimeError: CUDA error: invalid device ordinal
[rank4]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank4]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank4]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[rank7]: Traceback (most recent call last):
[rank7]:   File "/home/tmsisa/llama3/cognitech.py", line 83, in <module>
[rank7]:     fire.Fire(main)
[rank7]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
[rank7]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank7]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
[rank7]:     component, remaining_args = _CallAndUpdateTrace(
[rank7]:                                 ^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
[rank7]:     component = fn(*varargs, **kwargs)
[rank7]:                 ^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/tmsisa/llama3/cognitech.py", line 58, in main
[rank7]:     generator = Llama.build(
[rank7]:                 ^^^^^^^^^^^^
[rank7]:   File "/home/tmsisa/llama3/llama/generation.py", line 75, in build
[rank7]:     torch.cuda.set_device(local_rank)
[rank7]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank7]:     torch._C._cuda_setDevice(device)
[rank7]: RuntimeError: CUDA error: invalid device ordinal
[rank7]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank7]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank7]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/tmsisa/llama3/cognitech.py", line 83, in <module>
[rank5]:     fire.Fire(main)
[rank5]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
[rank5]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank5]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
[rank5]:     component, remaining_args = _CallAndUpdateTrace(
[rank5]:                                 ^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
[rank5]:     component = fn(*varargs, **kwargs)
[rank5]:                 ^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/tmsisa/llama3/cognitech.py", line 58, in main
[rank5]:     generator = Llama.build(
[rank5]:                 ^^^^^^^^^^^^
[rank5]:   File "/home/tmsisa/llama3/llama/generation.py", line 75, in build
[rank5]:     torch.cuda.set_device(local_rank)
[rank5]:   File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank5]:     torch._C._cuda_setDevice(device)
[rank5]: RuntimeError: CUDA error: invalid device ordinal
[rank5]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank5]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank5]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

W0426 23:50:18.567000 140316710531456 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 126717 closing signal SIGTERM
W0426 23:50:18.568000 140316710531456 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 126718 closing signal SIGTERM
W0426 23:50:18.568000 140316710531456 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 126719 closing signal SIGTERM
W0426 23:50:18.568000 140316710531456 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 126720 closing signal SIGTERM
E0426 23:50:18.796000 140316710531456 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 4 (pid: 126721) of binary: /home/tmsisa/.conda/envs/llama_3_pytorch_env/bin/python
Traceback (most recent call last):
  File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.3.0', 'console_scripts', 'torchrun')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tmsisa/.conda/envs/llama_3_pytorch_env/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
cognitech.py FAILED
------------------------------------------------------------

@edzq
Copy link

edzq commented Apr 27, 2024

I have the same issue.

@subramen subramen added the needs-more-information Issue is not fully clear to be acted upon label Apr 30, 2024
@Itime-ren
Copy link

I have the same issue

@nightsSeeker
Copy link

nightsSeeker commented May 11, 2024

@subramen same issue. what more information would you need ? cause the current state for me is:

  1. I have pytorch and cuda 12.1 installed
  2. model 70B-Instrcut downloaded in the correct directory as the example repo
  3. run the torchrun command as specified

and getting the above error. i also changed the backend from nccl to gloo to account for the warnings that were appearing, maybe that has something to do with it ?

@subramen
Copy link
Contributor

How many GPUs are you using? the 70B model will need 8GPUs to run from this repo. If you have less than 8 GPUs, please use the model from HF

@Lynn1
Copy link

Lynn1 commented May 14, 2024

my solution for reference:
https://github.com/Lynn1/llama3-stream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-information Issue is not fully clear to be acted upon
Projects
None yet
Development

No branches or pull requests

6 participants