Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reliability of the regression detection workflow #758

Open
tgregg opened this issue Feb 28, 2024 · 3 comments
Open

Improve reliability of the regression detection workflow #758

tgregg opened this issue Feb 28, 2024 · 3 comments

Comments

@tgregg
Copy link
Contributor

tgregg commented Feb 28, 2024

Currently the performance regression detection workflow executed via GitHub Actions is unreliable due to high variance in the results. We suspect we can drive down this variance by executing the regression detection workflows on hardware that we can guarantee is reserved for these workflows, and that executes the workflows serially.

Below are some options that have been discussed:

  1. Configure GitHub Actions to submit the jobs to reserved AWS hardware that we control. This is the more complicated of the two options listed here, but has the benefit of not shifting any burden onto the PR requester.
  2. Change the workflow so that it verifies reports uploaded by the requester of a PR, creating a build phase that executes the regression detection workflow (before/after runs) on the requester's hardware. This is the simpler option, but has the drawback that submitting a PR becomes a bit more onerous.
@popematt
Copy link
Contributor

In #746, I was having a hard time reproducing my results consistently even though I was always running the tests serially using the same hardware. I had other processes running at the time (which would also be the case for option 2), but I was running the tests in a single JVM using a single core of my 8-core M1 Pro CPU. It's unclear to me whether HotSpot optimizations are applied deterministically (i.e. will two runs of the same program with the same inputs result in the same HotSpot optimizations), and that may be a confounding factor here.

TLDR; dedicated hardware certainly can't hurt the test reliability, but it might not improve it either.

@tgregg
Copy link
Contributor Author

tgregg commented Feb 28, 2024

GitHub Action runner in AWS CodeBuild: https://docs.aws.amazon.com/codebuild/latest/userguide/action-runner.html

@jobarr-amzn
Copy link
Contributor

The regression detector came up after a point release caused a substantial performance regression, right?
How substantial was that, are we overtuned here? Are we trying to detect any regression at all or only prevent disastrous regression?

Have we considered some approach like JProffa, which measures byte codes executed instead of wall clock time?

If contention is impacting, could we try to control for that by making both halves of the comparison run concurrently? That will make contention even worse but it ought to effect both sides of the split evenly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants