RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

fhlkm · 2024-01-25T01:01:48Z

I am using wsl 2 with ubuntu-22.04, this is the gpu information

when i run "sudo lshw -C display"

I install torch using this command
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url
https://download.pytorch.org/whl/cu113

when i run command
"torchrun --nproc_per_node 1 example_instructions.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 4",

it has error: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'.

These are full log:


:/mnt/c/Users/john.john/codelama/weight/codellama-main$ torchrun --nproc_per_node 1 example_instructions.py \
kpt_dir>     --ckpt_dir CodeLlama-7b-Instruct/ \
>     --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
>     --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 38.10 seconds
Traceback (most recent call last):
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 68, in <module>
    fire.Fire(main)
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 51, in main
    results = generator.chat_completion(
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 351, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 164, in generate
    logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
  File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 300, in forward
    h = layer(h, start_pos, freqs_cis, (mask.to(device) if mask is not None else mask))
  File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 252, in forward
    h = x + self.attention.forward(
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 165, in forward
    xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
  File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/john/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 290, in forward
    output_parallel = F.linear(input_parallel, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 433) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/home/john/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_instructions.py FAILED

Failures:
  <NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
  time      : 2024-01-24_16:51:25
  host      : company
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 433)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

The text was updated successfully, but these errors were encountered:

akashdhruv · 2024-01-30T02:38:00Z

I am getting the same error, but in CPU mode. It looks like your model is running on CPU as well.

fhlkm · 2024-01-30T18:05:03Z

hi @akashdhruv ,
My pc has nvidia GPU, please check above screenshot.

It is suppose to run on GPU. do you know why it only runs on CPU?

akashdhruv · 2024-01-30T19:00:48Z

hi @akashdhruv , My pc has nvidia GPU, please check above screenshot.

It is suppose to run on GPU. do you know why it only runs on CPU?

I think you need to look into your system and torchrun configuration to figure out why GPU is not being identified. Is your PyTorch installed with GPU support? If yes maybe try,

export CUDA_VISIBLE_DEVICES="0"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

fhlkm commented Jan 25, 2024 •

edited

akashdhruv commented Jan 30, 2024

fhlkm commented Jan 30, 2024

akashdhruv commented Jan 30, 2024 •

edited

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #185

Comments

fhlkm commented Jan 25, 2024 • edited

akashdhruv commented Jan 30, 2024

fhlkm commented Jan 30, 2024

akashdhruv commented Jan 30, 2024 • edited

fhlkm commented Jan 25, 2024 •

edited

akashdhruv commented Jan 30, 2024 •

edited