Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(autoware_tensorrt_yolox): add GPU - CUDA device option #8245

Merged

Conversation

ismetatabay
Copy link
Member

Description

This PR adds a parameter to specify which GPU will be used for tensorrt_yolox inference on multiple GPU devices.

Related links

Parent Issue:

How was this PR tested?

This PR has been tested with 8 cameras on a 3-GPU computer. Since GPU-0 is used for the lidar centerpoint and other things, the 8 cameras are divided between GPU-1 and GPU-2. The NVIDIA-SMI output can be found in the following image.

tensorrt_yolox_gpu_usage

Notes for reviewers

None.

Interface changes

Change type Parameter Name Type Default Value Description
Added gpu_id int 0 GPU ID to select CUDA Device

Effects on system behavior

None.

@github-actions github-actions bot added the component:perception Advanced sensor data processing and environment understanding. (auto-assigned) label Jul 29, 2024
Copy link

github-actions bot commented Jul 29, 2024

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@ismetatabay ismetatabay marked this pull request as ready for review July 29, 2024 13:03
@kminoda kminoda added the tag:run-build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label Jul 29, 2024
@kminoda
Copy link
Contributor

kminoda commented Jul 29, 2024

@ismetatabay Please make sure to submit the PR after all the CIs are passing (json schema check is failing)

@ismetatabay ismetatabay marked this pull request as draft July 29, 2024 13:11
Copy link

codecov bot commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Project coverage is 23.99%. Comparing base (a64566e) to head (d3889b5).
Report is 3 commits behind head on main.

Files Patch % Lines
...ion/autoware_tensorrt_yolox/src/tensorrt_yolox.cpp 0.00% 11 Missing ⚠️
...utoware_tensorrt_yolox/src/tensorrt_yolox_node.cpp 0.00% 5 Missing ⚠️
..._detector/src/traffic_light_fine_detector_node.cpp 0.00% 5 Missing ⚠️
...include/autoware/tensorrt_yolox/tensorrt_yolox.hpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8245      +/-   ##
==========================================
- Coverage   24.11%   23.99%   -0.13%     
==========================================
  Files        1399     1397       -2     
  Lines      102423   102224     -199     
  Branches    38926    38778     -148     
==========================================
- Hits        24702    24529     -173     
+ Misses      75223    75187      -36     
- Partials     2498     2508      +10     
Flag Coverage Δ *Carryforward flag
differential 0.00% <0.00%> (?)
total 24.00% <ø> (-0.12%) ⬇️ Carriedforward from 8f97a9d

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ismetatabay ismetatabay marked this pull request as ready for review July 29, 2024 16:21
@ismetatabay ismetatabay linked an issue Jul 29, 2024 that may be closed by this pull request
5 tasks
@ismetatabay ismetatabay self-assigned this Jul 30, 2024
@@ -90,6 +91,10 @@ TrtYoloXNode::TrtYoloXNode(const rclcpp::NodeOptions & node_options)
const tensorrt_common::BatchConfig batch_config{1, 1, 1};
const size_t max_workspace_size = (1 << 30);

if (!setCudaDeviceId(gpu_id_)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ismetatabay
We appreciate your contribution. I guess engine generation by TensorRT would be better to be performed on the target device (GPU) on which inference will run. From this perspective, I wonder we need to set target device before engine generation; so I would appreciate it if you consider that order of operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manato -san, thank you for your valuable comments. Since I am using the same 3xGPU model for our tests, the created engine file for GPU 0 (default) is valid for the remaining two GPUs. If there is another different model GPU, it will be a problem. Therefore, we can update the tensorrt_common package to handle different GPUs as well. What do you think about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ismetatabay
Thank you for your consideration!
Yes, my main concern was such kinds of environments that different model GPUs exist. Regarding building engine on the specific device, it seems to be enough to call cudaSetDevice() before an engine builder starts to work, according to FAQs: Q.How do I use TensorRT on multiple GPUs? (I believe you can confirm this behavior by monitoring which GPU is used using nvidia-smi during engine building). If this is the case, I think we don't need to modify tensorrt_common because we just need to call cudaSetDevice before constructing tensorrt_common::TrtCommon.

Sorry for repeatedly asking, but could you consider to move this cuda-related codes into tensorrt_yolox.cpp for further encapsulation so that other nodes who use yolox as a module can enjoy your improvement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @manato -san, you are right. I am okay with moving everything to tensorrt_yolox.cpp. With this update, traffic_light_fine_detector can also select GPU as you mentioned. However, if I move everything to TrtYoloX class, I still get an illegal memory access error from CUDA unless I call cudaSetDevice before creating the trt_yolox_ object in the TrtYoloXNode constructor. Do you have any suggestions on that?

trt_yolox_ = std::make_unique<tensorrt_yolox::TrtYoloX>(
model_path, precision, label_map_.size(), score_threshold, nms_threshold, build_config,
preprocess_on_gpu, calibration_image_list_path, norm_factor, cache_dir, batch_config,
max_workspace_size, color_map_path);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ismetatabay -san for your thoughtful consideration!

I guess the cause of the issue you observed is this line; the line attempts to create cudaStream on the default (0) device during construction of an instance, and the stream will be used for memory copy before/after inference. Could you please try to move to call makeCudaStream() after setting the device in the class constructor and see if it makes a difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @manato -san, sorry for the late update. I was on leave and only had a chance to update it today. I have moved the updates to the tensorrt_yolox. Could you please check it? Thank you 🙌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ismetatabay -san, thank you for your continuous updates. The modifications look good to me!

@ismetatabay ismetatabay force-pushed the feat/yolox-add-device-option branch 2 times, most recently from 136a5ff to e581f64 Compare August 2, 2024 09:22
@ismetatabay ismetatabay marked this pull request as draft August 12, 2024 12:59
@ismetatabay ismetatabay force-pushed the feat/yolox-add-device-option branch 3 times, most recently from 941c3f9 to 6e136f5 Compare August 14, 2024 15:58
@ismetatabay ismetatabay marked this pull request as ready for review August 14, 2024 15:59
@Shin-kyoto
Copy link
Contributor

/review

@Shin-kyoto
Copy link
Contributor

/improve

Copy link

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review

Possible Bug
The setCudaDeviceId method returns a boolean, but the constructor throws an exception if it fails. This inconsistency in error handling should be addressed. Consider making the method throw an exception instead of returning a boolean.

Logging
The use of std::cout for logging GPU selection is not ideal for production code. Consider using a proper logging framework.

@ismetatabay ismetatabay changed the title feat(tensorrt_yolox): add GPU - CUDA device option feat(autoware_tensorrt_yolox): add GPU - CUDA device option Aug 16, 2024
Copy link
Contributor

@Shin-kyoto Shin-kyoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes.
Finally, could you please perform a functionality check with all the changes included and document the verification method and results?

ismetatabay and others added 10 commits August 23, 2024 15:20
Signed-off-by: ismetatabay <[email protected]>
Signed-off-by: ismetatabay <[email protected]>

style(pre-commit): autofix
Signed-off-by: ismetatabay <[email protected]>
Signed-off-by: ismetatabay <[email protected]>
Signed-off-by: ismetatabay <[email protected]>
@github-actions github-actions bot added the type:documentation Creating or refining documentation. (auto-assigned) label Aug 23, 2024
@ismetatabay
Copy link
Member Author

Hello @Shin-kyoto -san, @manato -san, @Owen-Liuyuxuan -san,

Thank you for your review and valuable comments. Today, I tested the latest version of the PR with our test vehicle, and everything worked as expected. The vehicle has 8 cameras and 3 GPUs; I divided the 8 cameras between GPU 1 and GPU 2 (4 cameras each) for the tests. The NVIDIA SMI output is shown in the following image.

image

I used the following launch files to launch the tensorrt_yolox package at our vehicle:

If it is okay for you, I think we can merge this PR.

@Owen-Liuyuxuan
Copy link
Contributor

I have no further issue.

@Shin-kyoto Shin-kyoto self-requested a review August 27, 2024 03:15
Copy link
Contributor

@Shin-kyoto Shin-kyoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  • I confirmed that the content of this PR is enough to merge.
  • I did NOT check this PR by running autoware.

Copy link
Contributor

@manato manato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ismetatabay
Thank you for your continuous updating to solve my concerns. LGTM!
I confirmed this modification causes no problem at least x86 laptop with 1 GPU.

@ismetatabay ismetatabay merged commit e434372 into autowarefoundation:main Aug 27, 2024
29 of 31 checks passed
@ismetatabay ismetatabay deleted the feat/yolox-add-device-option branch August 27, 2024 09:16
a-maumau pushed a commit to a-maumau/autoware.universe that referenced this pull request Sep 2, 2024
…foundation#8245)

* init CUDA device option

Signed-off-by: ismetatabay <[email protected]>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
batuhanbeytekin pushed a commit to batuhanbeytekin/autoware.universe that referenced this pull request Sep 2, 2024
…foundation#8245)

* init CUDA device option

Signed-off-by: ismetatabay <[email protected]>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Batuhan Beytekin <[email protected]>
ktro2828 pushed a commit to ktro2828/autoware.universe that referenced this pull request Sep 18, 2024
…foundation#8245)

* init CUDA device option

Signed-off-by: ismetatabay <[email protected]>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) tag:require-cuda-build-and-test tag:run-build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) type:documentation Creating or refining documentation. (auto-assigned)
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

tensorrt_yolox node does not support GPU selection
5 participants