Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trtexec multi-source (streams) and multi-batch performance test failed #47

Open
YunghuiHsu opened this issue Jun 2, 2023 · 1 comment

Comments

@YunghuiHsu
Copy link

YunghuiHsu commented Jun 2, 2023

Description
I want to test the performance of the model in multi-streams and multi-batches (https://github.com/NVIDIA-AI-IOT/yolo_deepstream#performance) with the trtexec command, and I test it with the following command

/usr/src/tensorrt/bin/trtexec --loadEngine=yolov7_b16_int8_qat_640.engine --shapes=images:4x3x640x640 --streams=4

ps:
The .engine model source is converted by the following command(dynamic batch)

/usr/src/tensorrt/bin/trtexec --verbose --onnx=yolov7_qat_640.onnx --workspace=4096 --minShapes=images:1x3x640x640 --optShapes=images:12x3x640x640 --maxShapes=images:16x3x640x640 --saveEngine=yolov7_b16_int8_qat_640.engine --fp16 --int8

but the following error occurs.

[06/02/2023-09:24:37] [I] === Model Options ===
[06/02/2023-09:24:37] [I] Format: *
[06/02/2023-09:24:37] [I] Model: 
[06/02/2023-09:24:37] [I] Output:
[06/02/2023-09:24:37] [I] === Build Options ===
[06/02/2023-09:24:37] [I] Max batch: explicit batch
[06/02/2023-09:24:37] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[06/02/2023-09:24:37] [I] minTiming: 1
[06/02/2023-09:24:37] [I] avgTiming: 8
[06/02/2023-09:24:37] [I] Precision: FP32
[06/02/2023-09:24:37] [I] LayerPrecisions: 
[06/02/2023-09:24:37] [I] Calibration: 
[06/02/2023-09:24:37] [I] Refit: Disabled
[06/02/2023-09:24:37] [I] Sparsity: Disabled
[06/02/2023-09:24:37] [I] Safe mode: Disabled
[06/02/2023-09:24:37] [I] DirectIO mode: Disabled
[06/02/2023-09:24:37] [I] Restricted mode: Disabled
[06/02/2023-09:24:37] [I] Build only: Disabled
[06/02/2023-09:24:37] [I] Save engine: 
[06/02/2023-09:24:37] [I] Load engine: yolov7_b16_int8_qat_640.engine
[06/02/2023-09:24:37] [I] Profiling verbosity: 0
[06/02/2023-09:24:37] [I] Tactic sources: Using default tactic sources
[06/02/2023-09:24:37] [I] timingCacheMode: local
[06/02/2023-09:24:37] [I] timingCacheFile: 
[06/02/2023-09:24:37] [I] Heuristic: Disabled
[06/02/2023-09:24:37] [I] Preview Features: Use default preview flags.
[06/02/2023-09:24:37] [I] Input(s)s format: fp32:CHW
[06/02/2023-09:24:37] [I] Output(s)s format: fp32:CHW
[06/02/2023-09:24:37] [I] Input build shape: images=4x3x640x640+4x3x640x640+4x3x640x640
[06/02/2023-09:24:37] [I] Input calibration shapes: model
[06/02/2023-09:24:37] [I] === System Options ===
[06/02/2023-09:24:37] [I] Device: 0
[06/02/2023-09:24:37] [I] DLACore: 
[06/02/2023-09:24:37] [I] Plugins:
[06/02/2023-09:24:37] [I] === Inference Options ===
[06/02/2023-09:24:37] [I] Batch: Explicit
[06/02/2023-09:24:37] [I] Input inference shape: images=4x3x640x640
[06/02/2023-09:24:37] [I] Iterations: 10
[06/02/2023-09:24:37] [I] Duration: 3s (+ 200ms warm up)
[06/02/2023-09:24:37] [I] Sleep time: 0ms
[06/02/2023-09:24:37] [I] Idle time: 0ms
[06/02/2023-09:24:37] [I] Streams: 4
[06/02/2023-09:24:37] [I] ExposeDMA: Disabled
[06/02/2023-09:24:37] [I] Data transfers: Enabled
[06/02/2023-09:24:37] [I] Spin-wait: Disabled
[06/02/2023-09:24:37] [I] Multithreading: Disabled
[06/02/2023-09:24:37] [I] CUDA Graph: Disabled
[06/02/2023-09:24:37] [I] Separate profiling: Disabled
[06/02/2023-09:24:37] [I] Time Deserialize: Disabled
[06/02/2023-09:24:37] [I] Time Refit: Disabled
[06/02/2023-09:24:37] [I] NVTX verbosity: 0
[06/02/2023-09:24:37] [I] Persistent Cache Ratio: 0
[06/02/2023-09:24:37] [I] Inputs:
[06/02/2023-09:24:37] [I] === Reporting Options ===
[06/02/2023-09:24:37] [I] Verbose: Disabled
[06/02/2023-09:24:37] [I] Averages: 10 inferences
[06/02/2023-09:24:37] [I] Percentiles: 90,95,99
[06/02/2023-09:24:37] [I] Dump refittable layers:Disabled
[06/02/2023-09:24:37] [I] Dump output: Disabled
[06/02/2023-09:24:37] [I] Profile: Disabled
[06/02/2023-09:24:37] [I] Export timing to JSON file: 
[06/02/2023-09:24:37] [I] Export output to JSON file: 
[06/02/2023-09:24:37] [I] Export profile to JSON file: 
[06/02/2023-09:24:37] [I] 
[06/02/2023-09:24:37] [I] === Device Information ===
[06/02/2023-09:24:37] [I] Selected Device: Xavier
[06/02/2023-09:24:37] [I] Compute Capability: 7.2
[06/02/2023-09:24:37] [I] SMs: 8
[06/02/2023-09:24:37] [I] Compute Clock Rate: 1.377 GHz
[06/02/2023-09:24:37] [I] Device Global Memory: 31002 MiB
[06/02/2023-09:24:37] [I] Shared Memory per SM: 96 KiB
[06/02/2023-09:24:37] [I] Memory Bus Width: 256 bits (ECC disabled)
[06/02/2023-09:24:37] [I] Memory Clock Rate: 1.377 GHz
[06/02/2023-09:24:37] [I] 
[06/02/2023-09:24:37] [I] TensorRT version: 8.5.2
[06/02/2023-09:24:38] [I] Engine loaded in 0.0275892 sec.
[06/02/2023-09:24:38] [I] [TRT] Loaded engine size: 39 MiB
[06/02/2023-09:24:39] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +41, now: CPU 0, GPU 41 (MiB)
[06/02/2023-09:24:39] [I] Engine deserialized in 1.04122 sec.
[06/02/2023-09:24:39] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +364, now: CPU 0, GPU 405 (MiB)
[06/02/2023-09:24:39] [I] Setting persistentCacheLimit to 0 bytes.
[06/02/2023-09:24:39] [I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[06/02/2023-09:24:39] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +363, now: CPU 0, GPU 768 (MiB)
[06/02/2023-09:24:39] [I] Setting persistentCacheLimit to 0 bytes.
[06/02/2023-09:24:39] [E] Error[1]: Unexpected exception cannot create std::vector larger than max_size()
[06/02/2023-09:24:39] [I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[06/02/2023-09:24:39] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +363, now: CPU 0, GPU 1131 (MiB)
[06/02/2023-09:24:39] [I] Setting persistentCacheLimit to 0 bytes.
[06/02/2023-09:24:39] [E] Error[1]: Unexpected exception cannot create std::vector larger than max_size()
[06/02/2023-09:24:39] [I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[06/02/2023-09:24:39] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +364, now: CPU 1, GPU 1495 (MiB)
[06/02/2023-09:24:39] [I] Setting persistentCacheLimit to 0 bytes.
[06/02/2023-09:24:39] [E] Error[1]: Unexpected exception cannot create std::vector larger than max_size()
[06/02/2023-09:24:39] [I] Using random values for input images
[06/02/2023-09:24:39] [I] Created input binding for images with dimensions 4x3x640x640
[06/02/2023-09:24:39] [I] Using random values for input images
[06/02/2023-09:24:39] [I] Created input binding for images with dimensions 4x3x640x640
[06/02/2023-09:24:39] [I] Using random values for input images
[06/02/2023-09:24:39] [I] Created input binding for images with dimensions 4x3x640x640
[06/02/2023-09:24:39] [I] Using random values for input images
[06/02/2023-09:24:39] [I] Created input binding for images with dimensions 4x3x640x640
[06/02/2023-09:24:39] [I] Using random values for output outputs
[06/02/2023-09:24:39] [I] Created output binding for outputs with dimensions 4x25200x85
[06/02/2023-09:24:39] [I] Using random values for output outputs
[06/02/2023-09:24:39] [I] Created output binding for outputs with dimensions 4x25200x85
[06/02/2023-09:24:39] [I] Using random values for output outputs
[06/02/2023-09:24:39] [I] Created output binding for outputs with dimensions 4x25200x85
[06/02/2023-09:24:39] [I] Using random values for output outputs
[06/02/2023-09:24:39] [I] Created output binding for outputs with dimensions 4x25200x85
[06/02/2023-09:24:39] [I] Starting inference
[06/02/2023-09:24:39] [E] Error[2]: [executionContext.cpp::enqueueV3::2386] Error Code 2: Internal Error (Assertion mOptimizationProfile >= 0 failed. )
[06/02/2023-09:24:39] [E] Error occurred during inference

Environment

TensorRT Version : 8.5.2
GPU Type : J etson AGX Xavier
Nvidia Driver Version :
CUDA Version : 11.4.315
CUDNN Version : 8.6.0.166
Operating System + Version : 35.2.1 ( Jetpack: 5.1)
Python Version (if applicable) : Python 3.8.10
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) : 1.12.0a0+2c916ef.nv22.3
@wanghr323
Copy link
Collaborator

Hi, I am not sure if you still can meet the issue on lastest tensorrt. pls have a try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants