Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

Open
fhlkm opened this issue Jan 25, 2024 · 3 comments
Open

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

fhlkm opened this issue Jan 25, 2024 · 3 comments

Comments

@fhlkm
Copy link

fhlkm commented Jan 25, 2024

I am using wsl 2 with ubuntu-22.04, this is the gpu information
image

when i run "sudo lshw -C display"
image

I install torch using this command
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url
https://download.pytorch.org/whl/cu113

when i run command
"torchrun --nproc_per_node 1 example_instructions.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 4",

it has error: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'.

These are full log:


:/mnt/c/Users/john.john/codelama/weight/codellama-main$ torchrun --nproc_per_node 1 example_instructions.py \
kpt_dir>     --ckpt_dir CodeLlama-7b-Instruct/ \
>     --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
>     --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 38.10 seconds
Traceback (most recent call last):
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 68, in <module>
    fire.Fire(main)
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 51, in main
    results = generator.chat_completion(
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 351, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 164, in generate
    logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
  File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 300, in forward
    h = layer(h, start_pos, freqs_cis, (mask.to(device) if mask is not None else mask))
  File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 252, in forward
    h = x + self.attention.forward(
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 165, in forward
    xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
  File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/john/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 290, in forward
    output_parallel = F.linear(input_parallel, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 433) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/home/john/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_instructions.py FAILED

Failures:
  <NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
  time      : 2024-01-24_16:51:25
  host      : company
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 433)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

@akashdhruv
Copy link

I am getting the same error, but in CPU mode. It looks like your model is running on CPU as well.

@fhlkm
Copy link
Author

fhlkm commented Jan 30, 2024

hi @akashdhruv ,
My pc has nvidia GPU, please check above screenshot.

It is suppose to run on GPU. do you know why it only runs on CPU?

@akashdhruv
Copy link

akashdhruv commented Jan 30, 2024

hi @akashdhruv , My pc has nvidia GPU, please check above screenshot.

It is suppose to run on GPU. do you know why it only runs on CPU?

I think you need to look into your system and torchrun configuration to figure out why GPU is not being identified. Is your PyTorch installed with GPU support? If yes maybe try,

export CUDA_VISIBLE_DEVICES="0"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants