Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Example stops with error when running with GPU (CUDA) "Expected all tensors to be on the same device, but found at least two devices" #214

Open
2 tasks done
fantauzzi opened this issue Jun 13, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@fantauzzi
Copy link

fantauzzi commented Jun 13, 2023

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running the example abstractive-question-answering.ipynb I get the following error in cell 18, calling generate_answer(query)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[38], line 1
----> 1 generate_answer(query)

Cell In[37], line 5, in generate_answer(query)
      3 inputs = tokenizer([query], max_length=1024, return_tensors="pt")
      4 # use generator to predict output ids
----> 5 ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
      6 # use tokenizer to decode the output ids
      7 answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/generation/utils.py:1329, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, **kwargs)
   1321         logger.warning(
   1322             "A decoder-only architecture is being used, but right-padding was detected! For correct "
   1323             "generation results, please set `padding_side='left'` when initializing the tokenizer."
   1324         )
   1326 if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs:
   1327     # if model is encoder decoder encoder_outputs are created
   1328     # and added to `model_kwargs`
-> 1329     model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
   1330         inputs_tensor, model_kwargs, model_input_name
   1331     )
   1333 # 5. Prepare `input_ids` which will be used for auto-regressive generation
   1334 if self.config.is_encoder_decoder:

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/generation/utils.py:642, in GenerationMixin._prepare_encoder_decoder_kwargs_for_generation(self, inputs_tensor, model_kwargs, model_input_name)
    640 encoder_kwargs["return_dict"] = True
    641 encoder_kwargs[model_input_name] = inputs_tensor
--> 642 model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
    644 return model_kwargs

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/models/bart/modeling_bart.py:811, in BartEncoder.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    808     raise ValueError("You have to specify either input_ids or inputs_embeds")
    810 if inputs_embeds is None:
--> 811     inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
    813 embed_pos = self.embed_positions(input)
    814 embed_pos = embed_pos.to(inputs_embeds.device)

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/sparse.py:162, in Embedding.forward(self, input)
    161 def forward(self, input: Tensor) -> Tensor:
--> 162     return F.embedding(
    163         input, self.weight, self.padding_idx, self.max_norm,
    164         self.norm_type, self.scale_grad_by_freq, self.sparse)

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/functional.py:2210, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2204     # Note [embedding_renorm set_grad_enabled]
   2205     # XXX: equivalent to
   2206     # with torch.no_grad():
   2207     #   torch.embedding_renorm_
   2208     # remove once script supports set_grad_enabled
   2209     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Expected Behavior

Should run without errors

Steps To Reproduce

On a system with an NVIDIA GPU supported by PyTorch:

  1. clone the repo
  2. start jupyter lab
  3. open and run the notebook

Relevant log output

N/A

Environment

- **OS**: Ubuntu 22.04
- **Language version**: Python 3.11.3
- **Pinecone client version**: 2.2.2

Additional Context

To fix the bug, change the following line in function generate_answer(query):
inputs = tokenizer([query], max_length=1024, return_tensors="pt")
to:
inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant