opentelemetry-operator manager crashes during instrumentation injection attempt #3303

sergeykad · 2024-09-24T14:36:10Z

Component(s)

auto-instrumentation

What happened?

Description

opentelemetry-operator manager crashes

Steps to Reproduce

Install opentelemetry-operator on Kubernetes cluster
Restart a pod that has the following configuration:

  annotations:
    instrumentation.opentelemetry.io/inject-java: "true"

Expected Result

A side-car is added to the pod and the service is instrumented with open-telemetry.

Actual Result

opentelemetry-operator crashes with the log seen below.

Kubernetes Version

1.25

Operator version

v0.109.0

Collector version

v0.69.0

Environment information

Environment

OS: Rocky Linux 9.3

Log output

{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Ingress"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodDisruptionBudget"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceMonitor"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodMonitor"}
{"level":"INFO","timestamp":"2024-09-24T13:57:31Z","message":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-24T13:57:32Z","logger":"collector-upgrade","message":"no instances to upgrade"}
{"level":"DEBUG","timestamp":"2024-09-24T13:57:32Z","logger":"controller-runtime.certwatcher","message":"certificate event","event":"CHMOD     \"/tmp/k8s-webhook-server/serving-certs/tls.key\""}
{"level":"INFO","timestamp":"2024-09-24T13:57:32Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
{"level":"DEBUG","timestamp":"2024-09-24T13:57:32Z","logger":"controller-runtime.certwatcher","message":"certificate event","event":"CHMOD     \"/tmp/k8s-webhook-server/serving-certs/tls.crt\""}
{"level":"INFO","timestamp":"2024-09-24T13:57:32Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
{"level":"INFO","timestamp":"2024-09-24T13:57:36Z","message":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
{"level":"INFO","timestamp":"2024-09-24T13:57:36Z","message":"Starting workers","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","worker count":1}
{"level":"DEBUG","timestamp":"2024-09-24T13:58:52Z","logger":"controller-runtime.certwatcher","message":"certificate event","event":"CHMOD     \"/tmp/k8s-webhook-server/serving-certs/tls.key\""}
{"level":"INFO","timestamp":"2024-09-24T13:58:52Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
{"level":"DEBUG","timestamp":"2024-09-24T13:58:52Z","logger":"controller-runtime.certwatcher","message":"certificate event","event":"CHMOD     \"/tmp/k8s-webhook-server/serving-certs/tls.crt\""}
{"level":"INFO","timestamp":"2024-09-24T13:58:52Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
{"level":"DEBUG","timestamp":"2024-09-24T13:59:26Z","message":"annotation not present in deployment, skipping sidecar injection","namespace":"optimus","name":""}
{"level":"DEBUG","timestamp":"2024-09-24T13:59:26Z","message":"injecting Java instrumentation into pod","otelinst-namespace":"optimus","otelinst-name":"instrumentation"}

Additional context

There are no additional log messages. The manager just disappears.

jaronoff97 · 2024-09-25T14:38:51Z

does the manager pod have any reason for its crash? OOMKilled maybe? I haven't been able to reproduce this.

sergeykad · 2024-09-25T16:04:03Z

There was no reason at all. It just died and a new pod started.
It looks like something crashed during instrumentation injection since it's the last message and it never added the sidecar.

I performed a similar deployment on a Minikube and it works fine but crashes on our production Kubernetes.
If there is an option to enable more detailed logs or do some other test I can try it.

jaronoff97 · 2024-09-25T16:09:25Z

you can follow the guide here on how to enable debug logs https://github.com/open-telemetry/opentelemetry-operator/blob/main/DEBUG.md, is it possible the operator doesn't have the permission to do mutation?

sergeykad · 2024-09-26T11:13:09Z

We already added --zap-log-level debug as seen in the attached log. If any additional parameters can help we will add them.

The operator gives itself the required permissions, so it's probably not the problem. We use the default resources.

The only possible reason for the error I can think of is that the cluster has no direct Internet access, but it can pull Docker images from the configured Docker proxy.

sergeykad added bug Something isn't working needs triage labels Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opentelemetry-operator manager crashes during instrumentation injection attempt #3303

opentelemetry-operator manager crashes during instrumentation injection attempt #3303

sergeykad commented Sep 24, 2024 •

edited

Loading

jaronoff97 commented Sep 25, 2024 •

edited

Loading

sergeykad commented Sep 25, 2024

jaronoff97 commented Sep 25, 2024

sergeykad commented Sep 26, 2024

opentelemetry-operator manager crashes during instrumentation injection attempt #3303

opentelemetry-operator manager crashes during instrumentation injection attempt #3303

Comments

sergeykad commented Sep 24, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Environment

Log output

Additional context

jaronoff97 commented Sep 25, 2024 • edited Loading

sergeykad commented Sep 25, 2024

jaronoff97 commented Sep 25, 2024

sergeykad commented Sep 26, 2024

sergeykad commented Sep 24, 2024 •

edited

Loading

jaronoff97 commented Sep 25, 2024 •

edited

Loading