Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test_bgp_stress_link_flap] case hung sometimes due to memory exhaust #14163

Merged
merged 16 commits into from
Aug 29, 2024

Conversation

lipxu
Copy link
Contributor

@lipxu lipxu commented Aug 19, 2024

Description of PR

Summary:
Fixes # (issue)
28852952
fix for #14076

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

The case is flaky
Sometimes the case runs a long time and no response. Especially on kvm device.
Based on current log, it should be related to the memory resource limitation.
The case would create so many threads to flap the neighbor, it would cause kvm device memory exhaust, and same for low memory physical device.
Based on available logs, no obvious memory leak issue.

How did you do it?

The case is for stress link flap, it creates thread per interface to flap.
1: enlarge the delay time for kvm
2: only test one interface for kvm
3: Use one thread to flap all the interfaces for fanout.
4: correct neighbor host
6: add event stop and timeout for thread function to ensure thread exit

How did you verify/test it?

run the case locally and verified using elastictest
https://elastictest.org/scheduler/testplan/66c69c4008761ba27f76ed5d
https://elastictest.org/scheduler/testplan/66c69c2708761ba27f76ed5b
https://elastictest.org/scheduler/testplan/66cdafd1bd14ce56b2e820f7

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@lipxu lipxu changed the title fix bgp neighbor incorrect in case test_bgp_stress_link_flap use one thread for fanout link flap in case test_bgp_stress_link_flap Aug 20, 2024
@lipxu lipxu changed the title use one thread for fanout link flap in case test_bgp_stress_link_flap [test_bgp_stress_link_flap] use one thread for fanout link flap Aug 20, 2024
@lipxu lipxu requested a review from wangxin August 22, 2024 13:24
wangxin
wangxin previously approved these changes Aug 23, 2024
@lipxu lipxu marked this pull request as draft August 23, 2024 07:14
@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing tests/bgp/test_bgp_stress_link_flap.py

fix end of files.........................................................Passed
check yaml...............................................................Passed
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/bgp/test_bgp_stress_link_flap.py:110:121: E501 line too long (125 > 120 characters)
tests/bgp/test_bgp_stress_link_flap.py:157:1: E303 too many blank lines (3)
tests/bgp/test_bgp_stress_link_flap.py:225:5: E303 too many blank lines (2)
tests/bgp/test_bgp_stress_link_flap.py:234:5: E303 too many blank lines (2)
tests/bgp/test_bgp_stress_link_flap.py:249:5: E303 too many blank lines (2)
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@lipxu lipxu marked this pull request as ready for review August 28, 2024 00:34
@lipxu lipxu changed the title [test_bgp_stress_link_flap] use one thread for fanout link flap [test_bgp_stress_link_flap] case hung sometimes due to memory exhaust Aug 29, 2024
@wangxin wangxin merged commit e3f1ae8 into sonic-net:master Aug 29, 2024
16 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Sep 2, 2024
…sonic-net#14163)

What is the motivation for this PR?
The case is flaky
Sometimes the case runs a long time and no response. Especially on kvm device.
Based on current log, it should be related to the memory resource limitation.
The case would create so many threads to flap the neighbor, it would cause kvm device memory exhaust, and same for low memory physical device.
Based on available logs, no obvious memory leak issue.

How did you do it?
The case is for stress link flap, it creates thread per interface to flap.
1: enlarge the delay time for kvm
2: only test one interface for kvm
3: Use one thread to flap all the interfaces for fanout.
4: correct neighbor host
6: add event stop and timeout for thread function to ensure thread exit

How did you verify/test it?
run the case locally and verified using elastictest
https://elastictest.org/scheduler/testplan/66c69c4008761ba27f76ed5d
https://elastictest.org/scheduler/testplan/66c69c2708761ba27f76ed5b
https://elastictest.org/scheduler/testplan/66cdafd1bd14ce56b2e820f7

Signed-off-by: xuliping <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #14378

StormLiangMS pushed a commit that referenced this pull request Sep 2, 2024
mssonicbld pushed a commit that referenced this pull request Sep 2, 2024
…#14163)

What is the motivation for this PR?
The case is flaky
Sometimes the case runs a long time and no response. Especially on kvm device.
Based on current log, it should be related to the memory resource limitation.
The case would create so many threads to flap the neighbor, it would cause kvm device memory exhaust, and same for low memory physical device.
Based on available logs, no obvious memory leak issue.

How did you do it?
The case is for stress link flap, it creates thread per interface to flap.
1: enlarge the delay time for kvm
2: only test one interface for kvm
3: Use one thread to flap all the interfaces for fanout.
4: correct neighbor host
6: add event stop and timeout for thread function to ensure thread exit

How did you verify/test it?
run the case locally and verified using elastictest
https://elastictest.org/scheduler/testplan/66c69c4008761ba27f76ed5d
https://elastictest.org/scheduler/testplan/66c69c2708761ba27f76ed5b
https://elastictest.org/scheduler/testplan/66cdafd1bd14ce56b2e820f7

Signed-off-by: xuliping <[email protected]>
hdwhdw pushed a commit to hdwhdw/sonic-mgmt that referenced this pull request Sep 20, 2024
…sonic-net#14163)

What is the motivation for this PR?
The case is flaky
Sometimes the case runs a long time and no response. Especially on kvm device.
Based on current log, it should be related to the memory resource limitation.
The case would create so many threads to flap the neighbor, it would cause kvm device memory exhaust, and same for low memory physical device.
Based on available logs, no obvious memory leak issue.

How did you do it?
The case is for stress link flap, it creates thread per interface to flap.
1: enlarge the delay time for kvm
2: only test one interface for kvm
3: Use one thread to flap all the interfaces for fanout.
4: correct neighbor host
6: add event stop and timeout for thread function to ensure thread exit

How did you verify/test it?
run the case locally and verified using elastictest
https://elastictest.org/scheduler/testplan/66c69c4008761ba27f76ed5d
https://elastictest.org/scheduler/testplan/66c69c2708761ba27f76ed5b
https://elastictest.org/scheduler/testplan/66cdafd1bd14ce56b2e820f7

Signed-off-by: xuliping <[email protected]>
hdwhdw pushed a commit to hdwhdw/sonic-mgmt that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants