TensorRT EP could not deserialize engine from binary data #22139
Labels
api:CSharp
issues related to the C# API
ep:TensorRT
issues related to TensorRT execution provider
performance
issues related to performance regressions
Describe the issue
Hi,
I've wrapped a TensorRT engine in an _ctx.onnx file using an official python script (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py#L156-L187)
The problem is that I get the "TensorRT EP could not deserialize engine from binary data" error. The TensorRT model works well using the TensorRT API. I am kind of stuck since there is no other information to help me figure out why this happens.
I've tried using different ortTrtOptions but to no avail.
This error occurs when creating an inference session. I tried both the FP16 and INT8 version and I got the same error.
I've uploaded the FP16 version and it'd be great if you have time to look at it.
Thanks!
Edit:
Graphics Card: 3090
The trt engine was built using the following profile shapes:
min: 1x1024x128x3
opt: 1x4096x640x3
max: 1x8000x1400x3
To reproduce
EmbededTrtEngine_FP16_ctx.zip
Urgency
Both a workaround or a fix would help.
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
C#
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8, CuDNN 8.9.7.29, TRT 10.4.0.26 and 10.1.0.27
Model File
EmbededTrtEngine_FP16_ctx.zip
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: