Fix flaky tests #882

ckbedwell · 2024-07-23T09:31:01Z

Problem

With the new CheckForm testing meta framework which added significantly to our testing coverage and by extension our testing overhead, we were starting to see increasingly flaky tests in the CI/CD pipelines as well as when running them locally (I had recently resorted to getting a fan to cool my laptop when running them as it was heating up significantly...).

Our testing approach is 100% the correct way we should be doing testing so we shouldn't shy away from writing long-running integration tests as the alternatives are either:

moving the same tests to an e2e framework which can be equally (if not more so) flaky and will take magnitudes longer to run and get feedback.
writing a ton of smaller unit tests that are often too closely coupled with the implementation and slow down developer velocity.

Solution

I noticed the flaky tests always seemed to be the ones involving the TCP section of requests (HTTP, gRPC, TCP) and that they took a long time to run and then timed out. As jest tests run in separate worker pools it often seemed like when these TCP tests were running they could cause other tests running at the same time to error out randomly too -- presumably because they were hogging so many CPU resources the other tests were suffering.

Recently in another PR, I noticed that our schema validations are running multiple times per onChange event when a user is typing. It is still something worth investigating at a future date but it makes no discernible impact to real users utilising our application.

However it occurred to me that the TCP fields require typing in valid PEM certificates and keys. The testing key we use is 1750 characters long. If the validation is running 4 or 5 times per character stroke each of these tests is having to validate the input ~10,000 times on top of all the additional computations the input / validation would trigger.

The solution is in the checkform testing helper to override the default user.type method and actually turn it into a focus / paste the content so it only triggers the input onChange handlers once. I was considering only making this change to the TCP fields with the large inputs but I couldn't think of a good reason why we shouldn't add the override at this level (there's an argument we should even add it to our custom render function in test/render?).

I've also done a little bit of cleanup and removed the custom jest.timeout declarations and ensured the waitFor methods in tests wait much longer than their default 1 second to ensure our test flakiness is kept to a minimum.

Before

Test Suites: 2 failed, 79 passed, 81 total
Tests:       2 failed, 4 skipped, 398 passed, 404 total
Snapshots:   0 total
Time:        126.822 s
Ran all test suites.

Had to have a fan running not to overheat my computer 🥵

After

Test Suites: 81 passed, 81 total
Tests:       4 skipped, 400 passed, 404 total
Snapshots:   0 total
Time:        85.987 s
Ran all test suites.
✨  Done in 86.58s.

No more fan 😎

VikaCep

You're a hero! 👏 👏 👏

fix: reimplement user.type as actually pasting in the meta framework

4e1f915

ckbedwell requested a review from VikaCep July 23, 2024 09:31

ckbedwell requested a review from a team as a code owner July 23, 2024 09:31

ckbedwell changed the title ~~Fix flakey tests~~ Fix flaky tests Jul 23, 2024

VikaCep approved these changes Jul 23, 2024

View reviewed changes

ckbedwell merged commit 84db7c4 into main Jul 23, 2024
5 checks passed

ckbedwell deleted the fix/flakey-tests branch July 23, 2024 12:52

VikaCep mentioned this pull request Jul 23, 2024

Fix tests in browser checks #885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky tests #882

Fix flaky tests #882

ckbedwell commented Jul 23, 2024

VikaCep left a comment

Fix flaky tests #882

Fix flaky tests #882

Conversation

ckbedwell commented Jul 23, 2024

Problem

Solution

VikaCep left a comment

Choose a reason for hiding this comment