Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I was reasoning on the GPU L20(48GB) machine and still burst the video memory #94

Open
try2020-code opened this issue May 16, 2024 · 0 comments

Comments

@try2020-code
Copy link

 [2024-05-16 13:48:21,126] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:31<00:00, 22.93s/it]
Some weights of the model checkpoint at work_dirs/llama-vid/llama-vid-7b-full-224-long-video-MovieLLM were not used when initializing LlavaLlamaAttForCausalLM: ['model.vision_tower.vision_tower.blocks.34.attn.v_bias', 'model.vlm_att_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight', 'model.vision_tower.vision_tower.blocks.1.norm1.weight', 'model.vision_tower.vision_tower.blocks.17.attn.q_bias', 'model.vlm_att_encoder.bert.encoder.layer.0.output.LayerNorm.bias', 'model.vision_tower.vision_tower.blocks.10.attn.proj.weight', 'model.vlm_att_encoder.bert.encoder.layer.5.output_query.LayerNorm.weight', 'model.vision_tower.vision_tower.blocks.13.attn.q_bias', 'too many data']
- This IS expected if you are initializing LlavaLlamaAttForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlavaLlamaAttForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
_IncompatibleKeys(missing_keys=[], unexpected_keys=['norm.weight', 'norm.bias', 'head.weight',......too many data']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Freezing all qformer weights...
Loading pretrained weights...
Loading vlm_att_query weights...
Loading vlm_att_ln weights...
Text with video
> Input token num: 32096
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
Traceback (most recent call last):
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/serve/run_llamavid_movie.py", line 112, in <module>
    run_inference(args)
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/serve/run_llamavid_movie.py", line 87, in run_inference
    output_ids = model.generate(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample
    outputs = self(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/model/language_model/llava_llama_vid.py", line 85, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/train/llama_flash_attn_monkey_patch.py", line 157, in forward_inference
    v = torch.cat([past_key_value[1].transpose(1, 2), v], dim=1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 252.00 MiB (GPU 0; 47.50 GiB total capacity; 41.09 GiB already allocated; 132.56 MiB free; 47.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(MovieLLM) root@autodl-container-307c46a8f1-f6e37430:~/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID# 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant