Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts/visual_video.sh fails #11

Open
SergeySandler opened this issue Jun 15, 2023 · 3 comments
Open

scripts/visual_video.sh fails #11

SergeySandler opened this issue Jun 15, 2023 · 3 comments

Comments

@SergeySandler
Copy link

SergeySandler commented Jun 15, 2023

With my current configuration that follows requirements, bash scripts/visual_video.sh fails with

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
Please compile MultiScaleDeformableAttention CUDA op with the following commands:
        `cd mask2former/modeling/pixel_decoder/ops`
        `sh make.sh`

despite running make.sh produces

Installed /usr/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg
Processing dependencies for MultiScaleDeformableAttention==1.0
Finished processing dependencies for MultiScaleDeformableAttention==1.0

At the same time, inference with detectron2 python3 detectron2/demo/demo.py works as expected.

Having a reproducible configuration would hopefully eliminate the scripts/visual_video.sh failure, having a Dockerfile would be ideal.

@SergeySandler SergeySandler changed the title Incomplete requirements.txt, scripts/visual_video.sh fails scripts/visual_video.sh fails Jun 15, 2023
@lkeab
Copy link
Collaborator

lkeab commented Jun 22, 2023

what is your cuda version and gpu types?

@SergeySandler
Copy link
Author

SergeySandler commented Jun 26, 2023

@lkeab, the attached Dockerfile helped to eliminate the problem.

There are two minor issues with demo_video/demo.py,

  1. modify line 162 to add a missing parameter,
    predictions, visualized_output = demo.run_on_video(vid_frames, args.confidence_threshold)
  2. modify line 140 by replacing fps=5 with duration = 200 to align with imageio==2.31.1,

and one major issue. demo_video/demo.py with input as images seems to run OK for up to 7 images in a folder; for larger number of images it fails with a CUDA error; needless to say demo_video/demo.py with input as video fails with a similar error, e.g.

[06/23 13:07:08 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /dataout/MaskFreeVis/mfvis_models/model_final_swinl_0560.pth ...
[06/23 13:07:08 fvcore.common.checkpoint]: [Checkpointer] Loading from /dataout/MaskFreeVis/mfvis_models/model_final_swinl_0560.pth ...
Traceback (most recent call last):
  File "demo_video/demo.py", line 162, in <module>
    predictions, visualized_output = demo.run_on_video(vid_frames, args.confidence_threshold)
  File "/MaskFreeVIS/demo_video/predictor.py", line 46, in run_on_video
    predictions = self.predictor(frames)
  File "/MaskFreeVIS/demo_video/predictor.py", line 111, in __call__
    predictions = self.model([inputs])
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/MaskFreeVIS/demo_video/../mask2former_video/video_maskformer_model.py", line 291, in forward
    features = self.backbone(images.tensor)
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 753, in forward
    y = super().forward(x)
  File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 672, in forward
    x_out = norm_layer(x_out)
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/functional.py", line 2347, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

My environment:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

nvidia-smi
Mon Jun 26 00:24:28 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.89       Driver Version: 513.63       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 3000     On   | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P8    14W /  N/A |    765MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

@SergeySandler
Copy link
Author

@lkeab, with a modified Dockerfile that includes CUDA 11.3 (not 11.1 as previously), and running CUDA kernels synchronously having CUDA_LAUNCH_BLOCKING=1, demo_video/demo.py triggers either
File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 159, in forward attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0) RuntimeError: CUDA error: out of memory
or
File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 159, in forward attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0) RuntimeError: CUDA error: an illegal memory access was encountered
error. Do you think the problem is in having only 6 GB of dedicated GPU memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants