Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtime performance question #73

Closed
lachose1 opened this issue Aug 17, 2020 · 24 comments
Closed

Realtime performance question #73

lachose1 opened this issue Aug 17, 2020 · 24 comments
Labels
community/help wanted extra attention is needed enhancement New feature or request

Comments

@lachose1
Copy link

Hello and first of all thanks for the great tool you have built.

I wanted to ask some questions, I run the pretrained model on a 1080 ti on a realtime video and obtain so-so performances :
res50_coco_256x192: 12 FPS
mobilenetv2_coco_256x192: 15 FPS

Am I missing something or is it just that the GPU isn't strong enough. Thanks!

@innerlee
Copy link
Contributor

innerlee commented Aug 17, 2020

hi, which script was run? currently the inference code is not optimized yet. see #40. you may vote here #9 to help us prioritize

@jin-s13
Copy link
Collaborator

jin-s13 commented Aug 18, 2020

Please set flip_test=False in the configs for higher speed.

@jin-s13 jin-s13 added status/duplicate issue/PR already exists enhancement New feature or request labels Aug 18, 2020
@lachose1
Copy link
Author

hi, which script was run? currently the inference code is not optimized yet. see #40. you may vote here #9 to help us prioritize

Yes, I used pretty much the same settings as #40

Please set flip_test=False in the configs for higher speed.

This was already done, but thanks for clarifying. I guess you can close the thread if this is on the road map for the future, thanks a lot for your help!

@innerlee
Copy link
Contributor

innerlee commented Aug 20, 2020

thanks! it would be great if people can help with profiling and identifying the bottleneck. here is some guide we wrote earlier (can safely omit the Chinese characters and guess the content):

image

@innerlee innerlee added community/help wanted extra attention is needed and removed status/duplicate issue/PR already exists labels Aug 20, 2020
@fabro66
Copy link

fabro66 commented Jan 15, 2021

Hi~
I found cv2.ellipse2Poly in top_dow.py/bottom_up.py highly slow the inference speed.
If I replace it with cv2.line, hrnet_w32_wholebody_256×192_dark will speed up from 3.0fps to 16.0fps on a 1060. There are other areas that can be further optimized.

Replace it with the following:

for _, kpts in enumerate(pose_result):

    for kpts in pose_result:
        # draw each point on image
        if pose_kpt_color is not None:
            assert len(pose_kpt_color) == len(kpts)
            for kid, kpt in enumerate(kpts):
                x_coord, y_coord, kpt_score = int(kpt[0]), int(
                    kpt[1]), kpt[2]
                if kpt_score > kpt_score_thr:
                    r, g, b = pose_kpt_color[kid]
                    cv2.circle(img, (int(x_coord), int(y_coord)),
                               radius, (int(r), int(g), int(b)), -1)

        # draw limbs
        if skeleton is not None and pose_limb_color is not None:
            assert len(pose_limb_color) == len(skeleton)
            for sk_id, sk in enumerate(skeleton):
                pos1 = (int(kpts[sk[0] - 1, 0]), int(kpts[sk[0] - 1,
                                                          1]))
                pos2 = (int(kpts[sk[1] - 1, 0]), int(kpts[sk[1] - 1,
                                                          1]))
                if (pos1[0] > 0 and pos1[0] < img_w and pos1[1] > 0
                        and pos1[1] < img_h and pos2[0] > 0
                        and pos2[0] < img_w and pos2[1] > 0
                        and pos2[1] < img_h
                        and kpts[sk[0] - 1, 2] > kpt_score_thr
                        and kpts[sk[1] - 1, 2] > kpt_score_thr):

                    r, g, b = pose_limb_color[sk_id]
                    cv2.line(img, pos1, pos2, (int(r), int(g), int(b)), thickness=thickness)

@jin-s13
Copy link
Collaborator

jin-s13 commented Jan 15, 2021

Thanks for reporting @fabro66. Will try this out.

@lucasjinreal
Copy link

@fabro66 How many whole post-processing time did u measured for now?

@fabro66
Copy link

fabro66 commented Jan 22, 2021

@jinfagang I did not test the time it takes for post-processing. I just replace cv2.ellipse2Poly with cv2.line to speed up inference.

@lucasjinreal
Copy link

@fabro66 Did u able to run realtime with a detector (not from GT boxes). Such as with yolov5 and a pose model.

@fabro66
Copy link

fabro66 commented Jan 24, 2021

@jinfagang It can reach 16fps on a GTX1060 when I combine hrnet_w32_wholebody_256×192_dark and yolov3 (from mmdetection) to estimate whole-body keypoints.

@lucasjinreal
Copy link

@fabro66 I tested with yolov5s detector and shufflenetv2 pose on coco, the speed is about 7fps in 2 person on GTX1080ti.

What's the reason why it's slow?

@innerlee
Copy link
Contributor

If you have interest, please try to profile it and post the result, something like #344 (comment)

@lucasjinreal
Copy link

@innerlee I dont know how to using cprofile. Did u guys get same performance when test a normal video with more than 2 person in it?

1 similar comment
@lucasjinreal
Copy link

@innerlee I dont know how to using cprofile. Did u guys get same performance when test a normal video with more than 2 person in it?

@innerlee
Copy link
Contributor

Step1. use cProfile to run the script for a period of time, say, 30 seconds.
image
Step2. visualize the result by snakeviz
image

Refer to the instruction in #73 (comment) for more details

@lucasjinreal
Copy link

@innerlee thanks. Do u have any insights about it?

image

@innerlee
Copy link
Contributor

Please expand shared_transformation section, by clicking on it

image

@lucasjinreal
Copy link

@innerlee
image

@innerlee
Copy link
Contributor

Please click on the shared_transform.py, and it should print more details on the bottom level.

Crop the image is not what I meant

@lucasjinreal
Copy link

@innerlee ok.....

@innerlee
Copy link
Contributor

@jinfagang If you haven't deleted the profiling record, please post the result so that the bottleneck is visualized

@haseeb33
Copy link

@jinfagang Can you please share the config file for yolov5 for mmdet_model. I trained a yolov5s model independently and I want to use it as a detection model for pose estimation inference but I am having difficulties modifying the config file.
Thanks in advance.

@lucasjinreal
Copy link

@haseeb33 you can take a look at this repo: https://github.com/jinfagang/yolov7 it provides various yolo model with pure python instead yaml config files. Also support a e2e keypoints model

@haseeb33
Copy link

@jinfagang Thank you very much!

@Tau-J Tau-J closed this as completed Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community/help wanted extra attention is needed enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants