Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: No such operator tutel_ops::cumsum #190

Open
sharkdrop opened this issue Nov 14, 2022 · 10 comments
Open

RuntimeError: No such operator tutel_ops::cumsum #190

sharkdrop opened this issue Nov 14, 2022 · 10 comments

Comments

@sharkdrop
Copy link

sharkdrop commented Nov 14, 2022

Hello, thanks for providing such a great work. However, I cannot use tutel successfully. I have followed the library installation steps:

* Install Pytorch for NVIDIA CUDA >= 11.3:
        $ python3 -m pip install --user torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
       

* Install Tutel Online:

        $ python3 -m pip uninstall tutel -y
        $ python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main
        $ python3 ./tutel/setup.py install --user

But when I try the followed test:

* Quick Test on Single-GPU:

        $ python3 -m tutel.examples.helloworld --batch_size=16               # Test Tutel-optimized MoE + manual distribution

The followed error is reported:

Traceback (most recent call last):
  File "/root/miniconda3/envs/widenet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/miniconda3/envs/widenet/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/tutel-main/tutel/examples/helloworld.py", line 120, in <module>
    output = model(x)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/tutel-main/tutel/examples/helloworld.py", line 85, in forward
    result = self._moe_layer(input)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/tutel-main/tutel/impls/moe_layer.py", line 267, in forward
    logits_dtype, (crit, l_aux) = routing()
  File "/root/tutel-main/tutel/impls/moe_layer.py", line 261, in routing
    inequivalent_tokens = inequivalent_tokens,
  File "/root/tutel-main/tutel/impls/fast_dispatch.py", line 158, in extract_critical
    locations1 = compute_location(masks_se[0])
  File "/root/tutel-main/tutel/jit_kernels/gating.py", line 22, in fast_cumsum_sub_one
    return torch.ops.tutel_ops.cumsum(data)
  File "/root/.local/lib/python3.7/site-packages/torch/_ops.py", line 63, in __getattr__
    op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator tutel_ops::cumsum
@ghostplant
Copy link
Contributor

ghostplant commented Nov 15, 2022

What about export FAST_CUMSUM=0 first?

@sharkdrop
Copy link
Author

Thank you. I have tried this, and the new error is reported:

Traceback (most recent call last):
  File "helloworld.py", line 120, in <module>
    output = model(x)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "helloworld.py", line 85, in forward
    result = self._moe_layer(input)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/tutel-0.1-py3.7-linux-x86_64.egg/tutel/impls/moe_layer.py", line 271, in forward
    y = fast_encode(x.to(logits_dtype), crit, self.is_postscore).to(x.dtype)
  File "/root/.local/lib/python3.7/site-packages/tutel-0.1-py3.7-linux-x86_64.egg/tutel/impls/fast_dispatch.py", line 203, in fast_encode
    dispatcher.update(*critial_data[1:], is_postscore=is_postscore)
  File "/root/.local/lib/python3.7/site-packages/tutel-0.1-py3.7-linux-x86_64.egg/tutel/impls/fast_dispatch.py", line 108, in update
    self.func_fwd = jit_kernel.create_forward(self.dtype, indices_[0].is_cuda)
  File "/root/.local/lib/python3.7/site-packages/tutel-0.1-py3.7-linux-x86_64.egg/tutel/jit_kernels/sparse.py", line 35, in create_forward
    ''')
  File "/root/.local/lib/python3.7/site-packages/tutel-0.1-py3.7-linux-x86_64.egg/tutel/impls/jit_compiler.py", line 37, in generate_kernel
    return JitCompiler.create_raw(template)
  File "/root/.local/lib/python3.7/site-packages/tutel-0.1-py3.7-linux-x86_64.egg/tutel/impls/jit_compiler.py", line 26, in create_raw
    raise Exception('CUDA support is disabled during Tutel installation. Please configure CUDA correctly and reinstall Tutel to enable CUDA support, or report Tutel installation logs for help.')
Exception: CUDA support is disabled during Tutel installation. Please configure CUDA correctly and reinstall Tutel to enable CUDA support, or report Tutel installation logs for help.

@ghostplant
Copy link
Contributor

Gotcha, this problem is not from tutel::cumsum. Instead, you may perform an improper installation of Tutel that only enables CPU support rather than CUDA.

The root cause could be an improper CUDA SDK configuration in your system that makes Tutel fail to build up some components, like missing cuda.h, libnccl.so, etc.

Suggestion:
Can you firstly purge old tutel (pip3 uninstall -y tutel), and reinstall the tutel via source (python3 ./tutel/setup.py install --user)? This method will show the full log of installing messages, including error messages. You may notice what is wrong with the environment. If you can not find the way to fix the environment, you can paste the full installation log here, so that we can help you diagnose the problem.

@sharkdrop
Copy link
Author

Thank you for your suggestion,I'll have a try.

@gaow0007
Copy link

Hi, I also meet similar issues. I attach my installation log (using source installation method) below, and do not see any clues to facilitate identifying the cause. May I ask some further suggestions to find problems?

running install
running bdist_egg
running egg_info
writing tutel.egg-info/PKG-INFO
writing dependency_links to tutel.egg-info/dependency_links.txt
writing requirements to tutel.egg-info/requires.txt
writing top-level names to tutel.egg-info/top_level.txt
reading manifest file 'tutel.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'tutel.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-38/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/tutel
creating build/bdist.linux-x86_64/egg/tutel/custom
copying build/lib.linux-x86_64-cpython-38/tutel/custom/__init__.py -> build/bdist.linux-x86_64/egg/tutel/custom
creating build/bdist.linux-x86_64/egg/tutel/experts
copying build/lib.linux-x86_64-cpython-38/tutel/experts/ffn.py -> build/bdist.linux-x86_64/egg/tutel/experts
copying build/lib.linux-x86_64-cpython-38/tutel/experts/__init__.py -> build/bdist.linux-x86_64/egg/tutel/experts
creating build/bdist.linux-x86_64/egg/tutel/gates
copying build/lib.linux-x86_64-cpython-38/tutel/gates/cosine_top.py -> build/bdist.linux-x86_64/egg/tutel/gates
copying build/lib.linux-x86_64-cpython-38/tutel/gates/__init__.py -> build/bdist.linux-x86_64/egg/tutel/gates
copying build/lib.linux-x86_64-cpython-38/tutel/gates/top.py -> build/bdist.linux-x86_64/egg/tutel/gates
copying build/lib.linux-x86_64-cpython-38/tutel/moe.py -> build/bdist.linux-x86_64/egg/tutel
copying build/lib.linux-x86_64-cpython-38/tutel/net.py -> build/bdist.linux-x86_64/egg/tutel
creating build/bdist.linux-x86_64/egg/tutel/jit_kernels
copying build/lib.linux-x86_64-cpython-38/tutel/jit_kernels/gating.py -> build/bdist.linux-x86_64/egg/tutel/jit_kernels
copying build/lib.linux-x86_64-cpython-38/tutel/jit_kernels/__init__.py -> build/bdist.linux-x86_64/egg/tutel/jit_kernels
copying build/lib.linux-x86_64-cpython-38/tutel/jit_kernels/sparse.py -> build/bdist.linux-x86_64/egg/tutel/jit_kernels
copying build/lib.linux-x86_64-cpython-38/tutel/__init__.py -> build/bdist.linux-x86_64/egg/tutel
creating build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/moe_layer.py -> build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/losses.py -> build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/__init__.py -> build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/jit_compiler.py -> build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/communicate.py -> build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/overlap.py -> build/bdist.linux-x86_64/egg/tutel/impls
copying build/lib.linux-x86_64-cpython-38/tutel/impls/fast_dispatch.py -> build/bdist.linux-x86_64/egg/tutel/impls
creating build/bdist.linux-x86_64/egg/tutel/checkpoint
copying build/lib.linux-x86_64-cpython-38/tutel/checkpoint/scatter.py -> build/bdist.linux-x86_64/egg/tutel/checkpoint
copying build/lib.linux-x86_64-cpython-38/tutel/checkpoint/__init__.py -> build/bdist.linux-x86_64/egg/tutel/checkpoint
copying build/lib.linux-x86_64-cpython-38/tutel/checkpoint/gather.py -> build/bdist.linux-x86_64/egg/tutel/checkpoint
creating build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/moe_mnist.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld_amp.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld_from_scratch.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld_switch.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/__init__.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld_deepspeed.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/moe_cifar10.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld_ddp.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/examples/helloworld_ddp_tutel.py -> build/bdist.linux-x86_64/egg/tutel/examples
copying build/lib.linux-x86_64-cpython-38/tutel/jit.py -> build/bdist.linux-x86_64/egg/tutel
creating build/bdist.linux-x86_64/egg/tutel/launcher
copying build/lib.linux-x86_64-cpython-38/tutel/launcher/run.py -> build/bdist.linux-x86_64/egg/tutel/launcher
copying build/lib.linux-x86_64-cpython-38/tutel/launcher/__init__.py -> build/bdist.linux-x86_64/egg/tutel/launcher
copying build/lib.linux-x86_64-cpython-38/tutel/launcher/execl.py -> build/bdist.linux-x86_64/egg/tutel/launcher
copying build/lib.linux-x86_64-cpython-38/tutel/system.py -> build/bdist.linux-x86_64/egg/tutel
creating build/bdist.linux-x86_64/egg/tutel/parted
copying build/lib.linux-x86_64-cpython-38/tutel/parted/patterns.py -> build/bdist.linux-x86_64/egg/tutel/parted
creating build/bdist.linux-x86_64/egg/tutel/parted/backend
copying build/lib.linux-x86_64-cpython-38/tutel/parted/backend/__init__.py -> build/bdist.linux-x86_64/egg/tutel/parted/backend
creating build/bdist.linux-x86_64/egg/tutel/parted/backend/torch
copying build/lib.linux-x86_64-cpython-38/tutel/parted/backend/torch/executor.py -> build/bdist.linux-x86_64/egg/tutel/parted/backend/torch
copying build/lib.linux-x86_64-cpython-38/tutel/parted/backend/torch/__init__.py -> build/bdist.linux-x86_64/egg/tutel/parted/backend/torch
copying build/lib.linux-x86_64-cpython-38/tutel/parted/backend/torch/config.py -> build/bdist.linux-x86_64/egg/tutel/parted/backend/torch
copying build/lib.linux-x86_64-cpython-38/tutel/parted/solver.py -> build/bdist.linux-x86_64/egg/tutel/parted
copying build/lib.linux-x86_64-cpython-38/tutel/parted/__init__.py -> build/bdist.linux-x86_64/egg/tutel/parted
copying build/lib.linux-x86_64-cpython-38/tutel/parted/spmdx.py -> build/bdist.linux-x86_64/egg/tutel/parted
byte-compiling build/bdist.linux-x86_64/egg/tutel/custom/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/experts/ffn.py to ffn.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/experts/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/gates/cosine_top.py to cosine_top.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/gates/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/gates/top.py to top.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/moe.py to moe.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/net.py to net.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/jit_kernels/gating.py to gating.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/jit_kernels/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/jit_kernels/sparse.py to sparse.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/moe_layer.py to moe_layer.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/losses.py to losses.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/jit_compiler.py to jit_compiler.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/communicate.py to communicate.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/overlap.py to overlap.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/fast_dispatch.py to fast_dispatch.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/checkpoint/scatter.py to scatter.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/checkpoint/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/checkpoint/gather.py to gather.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/moe_mnist.py to moe_mnist.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_amp.py to helloworld_amp.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_from_scratch.py to helloworld_from_scratch.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_switch.py to helloworld_switch.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_deepspeed.py to helloworld_deepspeed.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld.py to helloworld.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/moe_cifar10.py to moe_cifar10.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_ddp.py to helloworld_ddp.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_ddp_tutel.py to helloworld_ddp_tutel.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/jit.py to jit.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/launcher/run.py to run.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/launcher/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/launcher/execl.py to execl.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/system.py to system.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/patterns.py to patterns.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/torch/executor.py to executor.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/torch/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/torch/config.py to config.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/solver.py to solver.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/__init__.py to __init__.cpython-38.pyc
byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/spmdx.py to spmdx.cpython-38.pyc
creating stub loader for tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/tutel_custom_kernel.py to tutel_custom_kernel.cpython-38.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying tutel.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying tutel.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying tutel.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying tutel.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
copying tutel.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying tutel.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
creating 'dist/tutel-0.2-py3.8-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing tutel-0.2-py3.8-linux-x86_64.egg
creating /mnt/lustre/wgao/.local/lib/python3.8/site-packages/tutel-0.2-py3.8-linux-x86_64.egg
Extracting tutel-0.2-py3.8-linux-x86_64.egg to /mnt/lustre/wgao/.local/lib/python3.8/site-packages
Adding tutel 0.2 to easy-install.pth file

Installed /mnt/lustre/wgao/.local/lib/python3.8/site-packages/tutel-0.2-py3.8-linux-x86_64.egg
Processing dependencies for tutel==0.2
Finished processing dependencies for tutel==0.2
/mnt/lustre/wgao/miniconda3/envs/prompt/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/mnt/lustre/wgao/miniconda3/envs/prompt/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/mnt/lustre/wgao/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))

@ghostplant
Copy link
Contributor

export FAST_CUMSUM=0

Have you tried: export FAST_CUMSUM=0

@0x6b64
Copy link

0x6b64 commented Apr 30, 2023

I get this error after setting FAST_CUMSUM=0

[1,319]<stderr>:│ /root/.local/lib/python3.9/site-packages/tutel/impls/fast_dispatch.py:108 in │
[1,319]<stderr>:│ update                                                                       │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│   105 │   │   if self.is_cuda != indices_[0].is_cuda:                        │
[1,319]<stderr>:│   106 │   │   │   self.is_cuda = indices_[0].is_cuda                         │
[1,319]<stderr>:│   107 │   │   │   if self.is_cuda not in TutelMoeFastDispatcher.kernel_pool: │
[1,319]<stderr>:│ ❱ 108 │   │   │   │   self.func_fwd = jit_kernel.create_forward(self.dtype,  │
[1,319]<stderr>:│   109 │   │   │   │   self.func_bwd_data = jit_kernel.create_backward_data(s │
[1,319]<stderr>:│   110 │   │   │   │   self.func_bwd_gate = jit_kernel.create_backward_gate(s │
[1,319]<stderr>:│   111 │   │   │   │   TutelMoeFastDispatcher.kernel_pool[self.is_cuda] = sel │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│ /root/.local/lib/python3.9/site-packages/tutel/jit_kernels/sparse.py:21 in   │
[1,319]<stderr>:│ create_forward                                                               │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│    18   if not is_cuda:              [1,319]<stderr>:                                        │
[1,319]<stderr>:│    19 │   return JitCompiler.generate_cpu_kernel(kernel_type=0)              │
[1,319]<stderr>:│    20                                                                        │
[1,319]<stderr>:│ ❱  21   return JitCompiler.generate_kernel({'dtype': get_kernel_dtype(param_ │
[1,319]<stderr>:│    22 │   #define __dtype @dtype@                                            │
[1,319]<stderr>:│    23 │                                                                      │
[1,319]<stderr>:│    24 │   extern "C" __global__ __launch_bounds__(1024) void execute(__dtype │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│ /root/.local/lib/python3.9/site-packages/tutel/impls/jit_compiler.py:40 in   │
[1,319]<stderr>:│ generate_kernel                                                              │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│   37 │   def generate_kernel(keyword_dict, template):                        │
[1,319]<stderr>:│   38 │     for key in keyword_dict:                                          │
[1,319]<stderr>:│   39 │   │   template = template.replace('@%s@' % key, str(keyword_dict[key] │
[1,319]<stderr>:│ ❱ 40 │     return JitCompiler.create_raw(template)                           │
[1,319]<stderr>:│   41 │                                                                       │
[1,319]<stderr>:│   42 │   @staticmethod                                                       │
[1,319]<stderr>:│   43 │   def generate_cpu_kernel(kernel_type):                               │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│ /root/.local/lib/python3.9/site-packages/tutel/impls/jit_compiler.py:29 in   │
[1,319]<stderr>:│ create_raw                                                                   │
[1,319]<stderr>:│                                                                              │
[1,319]<stderr>:│   26 │   def create_raw(source):                                             │
[1,319]<stderr>:│   27 │   │   torch.cuda.init()                                               │
[1,319]<stderr>:│   28 │   │   if not hasattr(tutel_custom_kernel, 'inject_source'):           │
[1,319]<stderr>:│ ❱ 29 │   │   │   raise Exception('CUDA support is disabled during Tutel inst │
[1,319]<stderr>:│   30 │   │   __ctx__ = tutel_custom_kernel.inject_source(source)             │
[1,319]<stderr>:│   31 │   │                                                                   │
[1,319]<stderr>:│   32 │   │   def func(*inputs, extra=[], blocks=[]):                         │
[1,319]<stderr>:╰──────────────────────────────────────────────────────────────────────────────╯
[1,319]<stderr>:Exception: CUDA support is disabled during Tutel installation. Please configure 
[1,319]<stderr>:CUDA correctly and reinstall Tutel to enable CUDA support, or report Tutel 
[1,319]<stderr>:installation logs for help.
[1,176]<stderr>:╭───────────────────── Traceback (most recent call last) ──────────────────────╮

@0x6b64
Copy link

0x6b64 commented Apr 30, 2023

Gotcha, this problem is not from tutel::cumsum. Instead, you may perform an improper installation of Tutel that only enables CPU support rather than CUDA.

The root cause could be an improper CUDA SDK configuration in your system that makes Tutel fail to build up some components, like missing cuda.h, libnccl.so, etc.

Suggestion: Can you firstly purge old tutel (pip3 uninstall -y tutel), and reinstall the tutel via source (python3 ./tutel/setup.py install --user)? This method will show the full log of installing messages, including error messages. You may notice what is wrong with the environment. If you can not find the way to fix the environment, you can paste the full installation log here, so that we can help you diagnose the problem.

This is the solution. I was missing the cuda toolkit. apt-get install -y cuda-toolkit

@ghostplant
Copy link
Contributor

I don't suggest you install cuda toolkit over default Ubuntu repository, as they are too old.

You should follow the instruction here: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu

After CUDA SDK is successfully, please purge previous tutel and do a fresh installation based on those new CUDA SDK libraries.

@0x6b64
Copy link

0x6b64 commented May 1, 2023

Thanks for the suggestion! I'm using an older driver && once I added nv repo, I was able to select the version that I needed. add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants