Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yolov7 performance bad - 10 FPS only on Orin AGX 64 GB #61

Open
Ben93kie opened this issue Jul 12, 2024 · 2 comments
Open

Yolov7 performance bad - 10 FPS only on Orin AGX 64 GB #61

Ben93kie opened this issue Jul 12, 2024 · 2 comments

Comments

@Ben93kie
Copy link

Ben93kie commented Jul 12, 2024

I followed the Yolov7 tutorial here.

Exported the onnx from the official pt file. Adjusted the paths in the config files. It successfully built the engine, but I'm getting 10 FPS only (compared to the promised >100).

Here is the output after conversion:

WARNING: [TRT]: If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [TRT]: Check verbose logs for the list of affected weights.
WARNING: [TRT]: - 82 weights are affected by this issue: Detected subnormal FP16 values.
WARNING: [TRT]: - 2 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
0:40:28.444976296 26592 0xaaaad0efc090 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /home/nvidia/Documents/yolo_deepstream/deepstream_yolo/yolov7.onnx_b16_gpu0_fp16.engine successfully
INFO: [FullDims Engine Info]: layers num: 2
0 INPUT kFLOAT images 3x640x640 min: 1x3x640x640 opt: 16x3x640x640 Max: 16x3x640x640
1 OUTPUT kFLOAT output 25200x85 min: 0 opt: 0 Max: 0

...

**PERF: 9.78 (9.76) 9.62 (9.60) 9.63 (9.61) 9.78 (9.76) 9.64 (9.62) 9.62 (9.60) 9.63 (9.61) 9.63 (9.61) 9.78 (9.76) 9.63 (9.61) 9.78 (9.76) 10.20 (10.17) 9.78 (9.76) 9.63 (9.61) 9.78 (9.76) 9.78 (9.76)
**PERF: 10.11 (9.89) 10.11 (9.81) 10.11 (9.81) 10.11 (9.89) 10.11 (9.82) 10.11 (9.81) 10.11 (9.81) 10.11 (9.81) 10.11 (9.89) 10.11 (9.81) 10.11 (9.89) 10.11 (10.08) 10.11 (9.89) 10.11 (9.81) 10.11 (9.89) 10.11 (9.89)
**PERF: 10.11 (10.00) 10.11 (9.95

Have I exported the onnx incorrecly or might I have missed sth.?

@Ben93kie
Copy link
Author

I had
num-sources=16
even though I just wanted 1 source..

@Ben93kie
Copy link
Author

Ben93kie commented Jul 12, 2024

Just a question again:
I'm getting ~60FPS now on an Orin AGX 64 GB in MAXN mode with the following config:

config_infer_primary_yolov7.txt:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=yolov7_dy.onnx_b16_gpu0_fp16.engine
#model-engine-file=yolov7_1280.onnx_b1_gpu0_fp16.engine
#model-engine-file=/home/nvidia/Documents/yolo_deepstream/deepstream_yolo/yolov7.onnx_b16_gpu0_fp16.engine
#onnx-file=yolov7_dy.onnx
#onnx-file=yolov7_1280.onnx
labelfile-path=labels.txt
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
## Bilinear Interpolation
scaling-filter=1
#parse-bbox-func-name=NvDsInferParseCustomYoloV7
parse-bbox-func-name=NvDsInferParseCustomYoloV7_cuda
#disable-output-host-copy=0
disable-output-host-copy=1
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#scaling-compute-hw=0
## start from DS6.2
crop-objects-to-roi-boundary=1


[class-attrs-all]
#nms-iou-threshold=0.3
#threshold=0.7
nms-iou-threshold=0.65
pre-cluster-threshold=0.25
topk=300

and

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
num-sources=1
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
output-file=yolov4.mp4

[osd]
enable=1
gpu-id=0
border-width=1
text-size=12
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
labelfile-path=labels.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
#config-file=config_infer_primary_yoloV4.txt
config-file=config_infer_primary_yoloV7.txt

[tracker]
enable=0
# For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
# ll-config-file required to set different tracker types
# ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
# ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml
# ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_DeepSORT.yml
gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

[tests]
file-loop=0

Have I missed a config? Advertised is 120FPS.

@Ben93kie Ben93kie reopened this Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant