-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.6.0-rc1: Controller panics due to metrics round-tripper data race #13637
Comments
👀 I didn't know you could cc @isubasinghe as we've talked about races many times before (e.g. #10807 (comment)) and @Joibel regarding metrics |
Yep, |
Upon further testing, it is specifically the metrics RoundTripper that causes the panic, but the logs one alone produces this other data race.
Similarly, in addition to the data race in the original post, running with just the metrics RoundTripper produces additional data races
|
Smoking gun has been identified, will make PR as soon as tests finish running. |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
These two roundtripper wrappers, instantiated here introduce a data race.
To demonstrate, apply this modification to the Makefile
Then execute
make test-functional E2E_ENV_FACTOR=2
(make sure to show all logs frommake start
)Stack trace of the data race follow below.
This PR adds E2E tests that currently fail, but pass with those roundtripper wrappers commented out. When the race detector is not enabled, the workflow controller panics as follows intermittently during those tests when cleaning up multiple daemon pods in rapid succession:
Both the data race and panic still occur when reverting the changes in that PR other than the additional tests.
P.S. I observed several other data races when I enabled the race detector, I strongly suggest using race detector-enabled binaries in CI. There may be several more latent issues hiding.
Version(s)
ce7f9bf
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
N/A
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: