You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using a custom segnet model trained following the steps from Onixaz Pytorch Segmentation. I can run the model using Device GPU without any issues. But When I run with the Device DLA. I face the following issue...
[TRT] =============== Computing costs for
[TRT] *************** Autotuning format combination: Half(1572864,524288,1024,1) -> Half(6144,512,32,1) ***************
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
[TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[TRT] *************** Autotuning format combination: Half(1572864,524288,1024,1) -> Half(512,512:16,32,1) ***************
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
[TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[TRT] *************** Autotuning format combination: Half(524288,1:4,1024,1) -> Half(6144,512,32,1) ***************
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
[TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[TRT] *************** Autotuning format combination: Half(524288,1:4,1024,1) -> Half(512,512:16,32,1) ***************
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
[TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[TRT] *************** Autotuning format combination: Half(524288,524288:16,1024,1) -> Half(6144,512,32,1) ***************
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
[TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[TRT] *************** Autotuning format combination: Half(524288,524288:16,1024,1) -> Half(512,512:16,32,1) ***************
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
[TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[TRT] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]}.)
[TRT] device DLA_0, failed to build CUDA engine
[TRT] device DLA_0, failed to load fcn_resnet18.onnx
[TRT] segNet -- failed to load.
segnet: failed to initialize segNet
I tested with trtexec. it does not show any error.
Can you please tell me what is the meaning of these errors?
What does it mean to provide a valid calibrator? To run it on DLA, I only changed the device type to DEVICE_DLA here, do I have to change anything else? [TRT] requested fasted precision for device DLA_0 without providing valid calibrator, disabling INT8
Why only the classifier 4 layer running on DLA?
[TRT] ---------- Layers Running on DLA ----------
[TRT] [DlaLayer] {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]}
[TRT] ---------- Layers Running on GPU ----------
[TRT] Trying to load shared library libcublas.so.11
[TRT] Loaded shared library libcublas.so.11
[TRT] Using cublas as plugin tactic source
[TRT] Trying to load shared library libcublasLt.so.11
[TRT] Loaded shared library libcublasLt.so.11
[TRT] Using cublasLt as core library tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +260, GPU +322, now: CPU 660, GPU 5053 (MiB)
[TRT] Trying to load shared library libcudnn.so.8
[TRT] Loaded shared library libcudnn.so.8
[TRT] Using cuDNN as plugin tactic source
[TRT] Using cuDNN as core library tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +82, GPU +125, now: CPU 742, GPU 5178 (MiB)
[TRT] Global timing cache in use. Profiling results in this builder pass will be stored.
[TRT] Constructing optimization profile number 0 [1/1].
[TRT] Reserving memory for host IO tensors. Host: 0 bytes
And what is the meaning of this error? I checked, all the layers are supported by DLA
[TRT] --------------- Timing Runner: {ForeignNode[/backbone/conv1/Conv.../classifier/classifier.4/Conv]} (DLA)
[TRT] Skipping tactic 0x0000000000000003 due to exception Assertion context.dlaContext != nullptr failed.
The model seems to run without any error when tested with trtexec-
@khsafkatamin not sure haven't tried those on DLA, it doesn't support all the layers it seems. You could check DeepStream for other models working with DLA.
Hi @dusty-nv,
I am using a custom segnet model trained following the steps from Onixaz Pytorch Segmentation. I can run the model using Device GPU without any issues. But When I run with the Device DLA. I face the following issue...
I tested with
trtexec
. it does not show any error.Can you please tell me what is the meaning of these errors?
What does it mean to provide a valid calibrator? To run it on DLA, I only changed the device type to DEVICE_DLA here, do I have to change anything else?
[TRT] requested fasted precision for device DLA_0 without providing valid calibrator, disabling INT8
Why only the classifier 4 layer running on DLA?
The model seems to run without any error when tested with trtexec-
Output:
The text was updated successfully, but these errors were encountered: