Tests are run prematurely, before services start working. #873

piotrminkina · 2024-01-24T17:31:55Z

Hello,

Consider the following k8s manifests, please:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: stefanprodan
  namespace: default
spec:
  interval: 15m
  type: oci
  url: oci://ghcr.io/stefanprodan/charts
---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: podinfo
  namespace: default
spec:
  interval: 15m
  chart:
    spec:
      chart: podinfo
      version: 6.5.4
      sourceRef:
        kind: HelmRepository
        name: stefanprodan
  releaseName: podinfo
  test:
    enable: true
  values:
    fullnameOverride: podinfo
    probes:
      startup:
        enable: true

Unfortunately, the installation of podinfo in such a configuration is not successful, because the tests run even before Pod reports that it is ready to handle requests.

In the helm controller logs you can read:

{"level":"info","ts":"2024-01-24T17:15:53.541Z","msg":"running 'test' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"info","ts":"2024-01-24T17:15:57.961Z","msg":"release is in a failed state: release has test in failed phase","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"error","ts":"2024-01-24T17:15:57.972Z","msg":"Reconciler error","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514","error":"terminal error: exceeded maximum retries: cannot remediate failed release"}

The state of Pods in the namespace is:

NAME                       READY   STATUS    RESTARTS   AGE
podinfo-grpc-test-7mwyr    0/1     Error     0          8s
podinfo-5d6694644d-xgsbp   0/1     Running   0          8s

Logs from Pod podinfo-grpc-test-7mwyr:

timeout: failed to connect service "podinfo.default:9999" within 1s

Could you put in place an implementation such that it only starts testing when all services report that they are ready to handle traffic?

Regards
Piotr Minkina

The text was updated successfully, but these errors were encountered:

hiddeco · 2024-01-24T17:40:00Z

I would think such an implementation would need to be added on the chart side, as part of the actual testing logic. As this problem is not unique to the controller itself, but would also happen when you run a helm test after a helm upgrade.

piotrminkina · 2024-01-24T19:09:37Z

As I read on the Chart Tests website, I must wait for all pods to become active before run tests. In a situation where we are talking about declarative application of Helm Charts then the controller should do the waiting.

I read the help for the helm install command and I read there:

--wait           if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment, StatefulSet, or ReplicaSet are in a ready state before marking the release as successful. It will wait for as long as --timeout
--wait-for-jobs  if set and --wait enabled, will wait until all Jobs have been completed before marking the release as successful. It will wait for as long as --timeout

Sounds promising, and so I add these parameter to the helm install command, with immediately ordering the execution of tests as soon as helm install returns control.

$ helm install podinfo oci://ghcr.io/stefanprodan/charts/podinfo --version 6.5.4 --set probes.startup.enable=true --wait --wait-for-jobs && helm test podinfo
Pulled: ghcr.io/stefanprodan/charts/podinfo:6.5.4
Digest: sha256:a961643aa644f24d66ad05af2cdc8dcf2e349947921c3791fc3b7883f6b1777f
NAME: podinfo
LAST DEPLOYED: Wed Jan 24 19:56:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl -n default port-forward deploy/podinfo 8080:9898
NAME: podinfo
LAST DEPLOYED: Wed Jan 24 19:56:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE:     podinfo-grpc-test-ibama
Last Started:   Wed Jan 24 19:56:29 2024
Last Completed: Wed Jan 24 19:56:33 2024
Phase:          Failed
NOTES:
1. Get the application URL by running these commands:
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl -n default port-forward deploy/podinfo 8080:9898
Error: 1 error occurred:
	* pod podinfo-grpc-test-ibama failed

Well, and unfortunately the effect is the same. The tests were run before the application Pods reported ready to receive traffic. I think it is simply a problem with the --wait parameter — it seems that it is not working as it should... What do You think @hiddeco?

stefanprodan · 2024-01-24T20:18:28Z

You need to set replicas 2, Helm has bug where it doesn’t wait for a single pod to be ready.

piotrminkina · 2024-01-25T11:50:41Z

@stefanprodan This bug you write about is reported somewhere? Is anyone fixing it? Actually increasing the replicas to 2 caused Helm to wait until Pod was ready. Thanks! Thus, the grpc-test test executed correctly, while I don't know why the jwt-test test ended with an error and no log left behind (the exit code was 1).

stefanprodan · 2024-01-25T11:58:36Z

This bug you write about is reported somewhere? Is anyone fixing it?

It's somewhere in the Helm repo, reported several years ago.

piotrminkina mentioned this issue Jan 24, 2024

It seems that --wait is not quite waiting for the ready state. helm/helm#12747

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests are run prematurely, before services start working. #873

Tests are run prematurely, before services start working. #873

piotrminkina commented Jan 24, 2024 •

edited

Loading

hiddeco commented Jan 24, 2024

piotrminkina commented Jan 24, 2024 •

edited

Loading

stefanprodan commented Jan 24, 2024

piotrminkina commented Jan 25, 2024

stefanprodan commented Jan 25, 2024

Tests are run prematurely, before services start working. #873

Tests are run prematurely, before services start working. #873

Comments

piotrminkina commented Jan 24, 2024 • edited Loading

hiddeco commented Jan 24, 2024

piotrminkina commented Jan 24, 2024 • edited Loading

stefanprodan commented Jan 24, 2024

piotrminkina commented Jan 25, 2024

stefanprodan commented Jan 25, 2024

piotrminkina commented Jan 24, 2024 •

edited

Loading

piotrminkina commented Jan 24, 2024 •

edited

Loading