Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM-VM does not support multiple GPUs currently #397

Open
MehmetMHY opened this issue Nov 15, 2023 · 4 comments · May be fixed by #411
Open

LLM-VM does not support multiple GPUs currently #397

MehmetMHY opened this issue Nov 15, 2023 · 4 comments · May be fixed by #411
Labels
bug Something isn't working feat/enhancement New feature or request HIGH-PRIORITY improvement

Comments

@MehmetMHY
Copy link
Contributor

MehmetMHY commented Nov 15, 2023

Currently LLM-VM does not support multiple GPU setups. Using runpod, I rented a setup with 2 RTX 3090 GPUs. Well running the local Bloom model example from the docs. I ran into this error:

`EleutherAI/pythia-70m-deduped` loaded on 2 GPUs.
Using model: bloom
Running with an empty context
Exception in thread Thread-2 (new_thread):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llm_vm/completion/optimize.py", line 45, in new_thread
    t[0] = foo()
  File "/usr/local/lib/python3.10/dist-packages/llm_vm/completion/optimize.py", line 259, in promiseCompletion
    best_completion = self.call_big(prompt, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llm_vm/client.py", line 102, in CALL_BIG
    return self.teacher.generate(prompt, max_len,**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llm_vm/onsite_llm.py", line 153, in generate
    generate_ids=self.model.generate(inputs.input_ids, max_length=max_length, **generation_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DataParallel' object has no attribute 'generate'
{'status': 0, 'resp': 'cannot unpack non-iterable NoneType object'}

On fix I had for this error was by adding the following code changes to src/llm_vm/onsite_llm.py (added the if-else statement under the "account ofr cases where model is wrapped..." comment):

diff --git a/src/llm_vm/onsite_llm.py b/src/llm_vm/onsite_llm.py
index 9fcfe3c..613acbf 100644
--- a/src/llm_vm/onsite_llm.py
+++ b/src/llm_vm/onsite_llm.py

@@ -151,7 +141,13 @@ class BaseOnsiteLLM(ABC):
             inputs = self.tokenizer(prompt, return_tensors="pt", **tokenizer_kwargs).to(device[0])
         else:
             inputs = self.tokenizer(prompt, return_tensors="pt").to(device)
-        generate_ids=self.model.generate(inputs.input_ids, max_length=max_length, **generation_kwargs)
+        
+        # account for cases where the model is wrapped in DataParallel
+        if isinstance(self.model, torch.nn.DataParallel):
+            generate_ids = self.model.module.generate(inputs.input_ids, max_length=max_length, **generation_kwargs)
+        else:
+            generate_ids = self.model.generate(inputs.input_ids, max_length=max_length, **generation_kwargs)
+
         resp= self.tokenizer.batch_decode(generate_ids,skip_special_tokens=True,clean_up_tokenization_spaces=False)[0]
         # need to drop the len(prompt) prefix with these sequences generally
         # because they include the prompt.

☝️ This change resolves the error but it does not fix the core issue. In this change we are using DataParallel.module to access the model's generate() function. In doing so, we would be bypassing the DataParallel wrapper and skipping the parallelism provided by "DataParallel". I believe this is the case and that we should implement a new solution.

@MehmetMHY MehmetMHY changed the title LLM-VM is not support multiple GPUs currently LLM-VM does not support multiple GPUs currently Nov 15, 2023
@VictorOdede VictorOdede added bug Something isn't working feat/enhancement New feature or request $100 HIGH-PRIORITY improvement labels Nov 21, 2023
@VictorOdede
Copy link
Collaborator

This issue can be solved using Ray.io.
For distributed training: https://docs.ray.io/en/latest/train/train.html
For distributed inference: https://docs.ray.io/en/latest/serve/index.html

@VictorOdede
Copy link
Collaborator

Also look into: https://huggingface.co/docs/accelerate/index

@Aryan8912
Copy link

can you please assign to me

Aryan8912 added a commit to Aryan8912/LLM-VM that referenced this issue Dec 1, 2023
@Aryan8912 Aryan8912 linked a pull request Dec 1, 2023 that will close this issue
@berkay3500
Copy link
Contributor

Is this issue still open?

@mmirman mmirman removed the $100 label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feat/enhancement New feature or request HIGH-PRIORITY improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants