You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
errors are prompted, and the completed outputs are like follows:
[2024-04-27 14:23:05,034] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Config (path: configs/opensora/inference/16x256x256.py): {'num_frames': 16, 'fps': 8, 'image_size': (256, 256), 'model': {'type': 'STDiT-XL/2', 'space_scale': 0.5, 'time_scale': 1.0, 'enable_flashattn': True, 'enable_layernorm_kernel': True, 'from_pretrained': 'OpenSora-v1-HQ-16x256x256.pth'}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': 'stabilityai/sd-vae-ft-ema', 'micro_batch_size': 4}, 'text_encoder': {'type': 't5', 'from_pretrained': 'DeepFloyd/t5-v1_1-xxl', 'model_max_length': 120}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0}, 'dtype': 'fp16', 'batch_size': 1, 'seed': 42, 'prompt_path': './assets/texts/t2v_samples.txt', 'save_dir': './outputs/samples/', 'multi_resolution': False}
/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/colossalai/initialize.py:48: UserWarning: config is deprecated and will be removed soon.
warnings.warn("config is deprecated and will be removed soon.")
[04/27/24 14:23:14] INFO colossalai - colossalai - INFO: /root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/colossalai/initialize.py:67 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, world size: 1
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:28<00:00, 14.11s/it]
Missing keys: []
Unexpected keys: []
0%| | 0/100 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/yilinchen/Open-Sora/scripts/inference.py", line 112, in
main()
File "/home/yilinchen/Open-Sora/scripts/inference.py", line 93, in main
samples = scheduler.sample(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/init.py", line 72, in sample
samples = self.p_sample_loop(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 434, in p_sample_loop
for sample in self.p_sample_loop_progressive(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 485, in p_sample_loop_progressive
out = self.p_sample(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 388, in p_sample
out = self.p_mean_variance(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/respace.py", line 94, in p_mean_variance
return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 267, in p_mean_variance
model_output = model(x, t, **model_kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/respace.py", line 127, in call
return self.model(x, new_ts, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/init.py", line 89, in forward_with_cfg
model_out = model.forward(combined, timestep, y, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 267, in forward
x = auto_grad_checkpoint(block, x, y, t0, y_lens, tpe)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/acceleration/checkpoint.py", line 24, in auto_grad_checkpoint
return module(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 98, in forward
x_s = self.attn(x_s)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/models/layers/blocks.py", line 152, in forward
from flash_attn import flash_attn_func
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/flash_attn/init.py", line 3, in
from flash_attn.flash_attn_interface import (
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: ZN2at4_ops15sum_IntList_out4callERKNS_6TensorEN3c1016OptionalArrayRefIlEEbSt8optionalINS5_10ScalarTypeEERS2
[2024-04-27 14:24:15,201] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 14511) of binary: /root/anaconda3/envs/env_open_sora/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/env_open_sora/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
the cuda version and pytorch related cuda version are same, both of them are 12.1:
(env_open_sora) root@yilinchen-X10SRA:/home/yilinchen/Open-Sora# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
(env_open_sora) root@yilinchen-X10SRA:/home/yilinchen/Open-Sora# python
Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
print(torch.version.cuda)
12.1
my gpu card is rtx 4090:
Product Name : NVIDIA GeForce RTX 4090
Product Brand : GeForce
Product Architecture : Ada Lovelace
the installation of opensora-1.0.0 is successful:
Successfully installed opensora-1.0.0
all files are in the directory of Open-Sora, in addition to that, i also put the downloaded .pth files in Open-Sora directly but any sub-directory of Open-Sora, these .pth files includes:
hi, guys
after i run the inference command:
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path OpenSora-v1-HQ-16x256x256.pth --prompt-path ./assets/texts/t2v_samples.txt
errors are prompted, and the completed outputs are like follows:
[2024-04-27 14:23:05,034] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Config (path: configs/opensora/inference/16x256x256.py): {'num_frames': 16, 'fps': 8, 'image_size': (256, 256), 'model': {'type': 'STDiT-XL/2', 'space_scale': 0.5, 'time_scale': 1.0, 'enable_flashattn': True, 'enable_layernorm_kernel': True, 'from_pretrained': 'OpenSora-v1-HQ-16x256x256.pth'}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': 'stabilityai/sd-vae-ft-ema', 'micro_batch_size': 4}, 'text_encoder': {'type': 't5', 'from_pretrained': 'DeepFloyd/t5-v1_1-xxl', 'model_max_length': 120}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0}, 'dtype': 'fp16', 'batch_size': 1, 'seed': 42, 'prompt_path': './assets/texts/t2v_samples.txt', 'save_dir': './outputs/samples/', 'multi_resolution': False}
/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/colossalai/initialize.py:48: UserWarning:
config
is deprecated and will be removed soon.warnings.warn("
config
is deprecated and will be removed soon.")[04/27/24 14:23:14] INFO colossalai - colossalai - INFO: /root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/colossalai/initialize.py:67 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, world size: 1
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:28<00:00, 14.11s/it]
Missing keys: []
Unexpected keys: []
0%| | 0/100 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/yilinchen/Open-Sora/scripts/inference.py", line 112, in
main()
File "/home/yilinchen/Open-Sora/scripts/inference.py", line 93, in main
samples = scheduler.sample(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/init.py", line 72, in sample
samples = self.p_sample_loop(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 434, in p_sample_loop
for sample in self.p_sample_loop_progressive(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 485, in p_sample_loop_progressive
out = self.p_sample(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 388, in p_sample
out = self.p_mean_variance(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/respace.py", line 94, in p_mean_variance
return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 267, in p_mean_variance
model_output = model(x, t, **model_kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/respace.py", line 127, in call
return self.model(x, new_ts, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/schedulers/iddpm/init.py", line 89, in forward_with_cfg
model_out = model.forward(combined, timestep, y, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 267, in forward
x = auto_grad_checkpoint(block, x, y, t0, y_lens, tpe)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/acceleration/checkpoint.py", line 24, in auto_grad_checkpoint
return module(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 98, in forward
x_s = self.attn(x_s)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/opensora/models/layers/blocks.py", line 152, in forward
from flash_attn import flash_attn_func
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/flash_attn/init.py", line 3, in
from flash_attn.flash_attn_interface import (
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: ZN2at4_ops15sum_IntList_out4callERKNS_6TensorEN3c1016OptionalArrayRefIlEEbSt8optionalINS5_10ScalarTypeEERS2
[2024-04-27 14:24:15,201] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 14511) of binary: /root/anaconda3/envs/env_open_sora/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/env_open_sora/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/env_open_sora/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
scripts/inference.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-04-27_14:24:15
host : yilinchen-X10SRA
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 14511)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
my installed softwares related with opensora-1.0.0 are like these:
apex 0.1
flash-attn 2.5.6
ninja 1.11.1.1
torch 2.1.2+cu121
torchaudio 2.1.2+cu121
torchvision 0.16.2+cu121
xformers 0.0.23.post1
packaging 24.0
the cuda version and pytorch related cuda version are same, both of them are 12.1:
(env_open_sora) root@yilinchen-X10SRA:/home/yilinchen/Open-Sora# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
(env_open_sora) root@yilinchen-X10SRA:/home/yilinchen/Open-Sora# python
Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
my gpu card is rtx 4090:
Product Name : NVIDIA GeForce RTX 4090
Product Brand : GeForce
Product Architecture : Ada Lovelace
the installation of opensora-1.0.0 is successful:
Successfully installed opensora-1.0.0
all files are in the directory of Open-Sora, in addition to that, i also put the downloaded .pth files in Open-Sora directly but any sub-directory of Open-Sora, these .pth files includes:
OpenSora-v1-16x256x256.pth
OpenSora-v1-HQ-16x512x512.pth
OpenSora-v1-HQ-16x256x256.pth
as the problems mentioned above, could any guys help me to fix them
thanks a lot~
The text was updated successfully, but these errors were encountered: