Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests are run prematurely, before services start working. #873

Open
piotrminkina opened this issue Jan 24, 2024 · 5 comments
Open

Tests are run prematurely, before services start working. #873

piotrminkina opened this issue Jan 24, 2024 · 5 comments

Comments

@piotrminkina
Copy link

piotrminkina commented Jan 24, 2024

Hello,

Consider the following k8s manifests, please:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: stefanprodan
  namespace: default
spec:
  interval: 15m
  type: oci
  url: oci://ghcr.io/stefanprodan/charts
---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: podinfo
  namespace: default
spec:
  interval: 15m
  chart:
    spec:
      chart: podinfo
      version: 6.5.4
      sourceRef:
        kind: HelmRepository
        name: stefanprodan
  releaseName: podinfo
  test:
    enable: true
  values:
    fullnameOverride: podinfo
    probes:
      startup:
        enable: true

Unfortunately, the installation of podinfo in such a configuration is not successful, because the tests run even before Pod reports that it is ready to handle requests.

In the helm controller logs you can read:

{"level":"info","ts":"2024-01-24T17:15:53.541Z","msg":"running 'test' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"info","ts":"2024-01-24T17:15:57.961Z","msg":"release is in a failed state: release has test in failed phase","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"error","ts":"2024-01-24T17:15:57.972Z","msg":"Reconciler error","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514","error":"terminal error: exceeded maximum retries: cannot remediate failed release"}

The state of Pods in the namespace is:

NAME                       READY   STATUS    RESTARTS   AGE
podinfo-grpc-test-7mwyr    0/1     Error     0          8s
podinfo-5d6694644d-xgsbp   0/1     Running   0          8s

Logs from Pod podinfo-grpc-test-7mwyr:

timeout: failed to connect service "podinfo.default:9999" within 1s

Could you put in place an implementation such that it only starts testing when all services report that they are ready to handle traffic?

Regards
Piotr Minkina

@hiddeco
Copy link
Member

hiddeco commented Jan 24, 2024

I would think such an implementation would need to be added on the chart side, as part of the actual testing logic. As this problem is not unique to the controller itself, but would also happen when you run a helm test after a helm upgrade.

@piotrminkina
Copy link
Author

piotrminkina commented Jan 24, 2024

As I read on the Chart Tests website, I must wait for all pods to become active before run tests. In a situation where we are talking about declarative application of Helm Charts then the controller should do the waiting.

I read the help for the helm install command and I read there:

--wait           if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment, StatefulSet, or ReplicaSet are in a ready state before marking the release as successful. It will wait for as long as --timeout
--wait-for-jobs  if set and --wait enabled, will wait until all Jobs have been completed before marking the release as successful. It will wait for as long as --timeout

Sounds promising, and so I add these parameter to the helm install command, with immediately ordering the execution of tests as soon as helm install returns control.

$ helm install podinfo oci://ghcr.io/stefanprodan/charts/podinfo --version 6.5.4 --set probes.startup.enable=true --wait --wait-for-jobs && helm test podinfo
Pulled: ghcr.io/stefanprodan/charts/podinfo:6.5.4
Digest: sha256:a961643aa644f24d66ad05af2cdc8dcf2e349947921c3791fc3b7883f6b1777f
NAME: podinfo
LAST DEPLOYED: Wed Jan 24 19:56:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl -n default port-forward deploy/podinfo 8080:9898
NAME: podinfo
LAST DEPLOYED: Wed Jan 24 19:56:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE:     podinfo-grpc-test-ibama
Last Started:   Wed Jan 24 19:56:29 2024
Last Completed: Wed Jan 24 19:56:33 2024
Phase:          Failed
NOTES:
1. Get the application URL by running these commands:
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl -n default port-forward deploy/podinfo 8080:9898
Error: 1 error occurred:
	* pod podinfo-grpc-test-ibama failed

Well, and unfortunately the effect is the same. The tests were run before the application Pods reported ready to receive traffic. I think it is simply a problem with the --wait parameter — it seems that it is not working as it should... What do You think @hiddeco?

@stefanprodan
Copy link
Member

You need to set replicas 2, Helm has bug where it doesn’t wait for a single pod to be ready.

@piotrminkina
Copy link
Author

@stefanprodan This bug you write about is reported somewhere? Is anyone fixing it? Actually increasing the replicas to 2 caused Helm to wait until Pod was ready. Thanks! Thus, the grpc-test test executed correctly, while I don't know why the jwt-test test ended with an error and no log left behind (the exit code was 1).

@stefanprodan
Copy link
Member

This bug you write about is reported somewhere? Is anyone fixing it?

It's somewhere in the Helm repo, reported several years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants