Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(K8s): Updating the K8s troubleshooting section #18495

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -374,5 +374,5 @@ WHERE clusterName = '_MY_CLUSTER_NAME_'
```

<Callout variant="tip">
If you still can't see control plane data, try the solution described in [Kubernetes integration troubleshooting: Not seeing data](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/kubernetes-integration-troubleshooting-not-seeing-data/).
If you still can't see Control Plane data, check out [this troubleshooting page](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/not-seeing-data).
</Callout>
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ redirects:
freshnessValidatedDate: never
---

To generate verbose logs and get version and configuration information, follow the steps below. For troubleshooting help, see [Not seeing data](/docs/integrations/host-integrations/troubleshooting/kubernetes-integration-troubleshooting-not-seeing-data) or [Error messages](/docs/integrations/host-integrations/troubleshooting/kubernetes-integration-troubleshooting-error-messages).
To generate verbose logs and get version and configuration information, follow the steps below. For troubleshooting help, see [Not seeing data](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/not-seeing-data) or [Error messages](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/error-messages).

If you're using version 2 of the integration, see [Kubernetes logs in version 2](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/overview/#logs-version2).

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Kubernetes integration errors v2
title: Kubernetes integration errors (version 2)
type: troubleshooting
tags:
- Integrations
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: "Missing nodes for version 2"
type: troubleshooting
tags:
- Integrations
- Kubernetes integration v2
- Troubleshooting
redirects:
- /docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/troubleshooting
metaDescription: Some troubleshooting tips if you're not seeing data show up for your New Relic's Kubernetes integration.
freshnessValidatedDate: never
---

## Problem

You [deployed the infrastructure agent](/docs/infrastructure/infrastructure-monitoring/get-started/choose-infra-install-method/) and completed the [Kubernetes installation procedure](/install/kubernetes/) but not all nodes show up.

## Solution

Follow these steps:

1. Confirm that you can schedule the infrastructure agent on each node by running this command:

```shell
kubectl describe daemonset newrelic-infra
```

2. Confirm that the time on all nodes is accurate. Nodes that are more than 2 minutes ahead or behind will not show up in the Cluster explorer. The following NRQL query can be used to check if this is the case:

```sql
FROM K8sNodeSample
SELECT latest(nr.ingestTimeMs - timestamp) / 1000 AS 'Clock offset seconds'
FACET nodeName LIMIT max SINCE 1 DAY AGO
```

3. [Retrieve the logs from the infrastructure agent](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/overview/#logs-version2) on the nodes that do not appear in the cluster explorer and confirm there are no [error messages](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/errors/).
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Upgrade from v2
title: Upgrade from version 2
tags:
- Integrations
- Kubernetes integration v2
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Error messages
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: 'Some of the more common error messages found in the infrastructure agent logs for New Relic Kubernetes integration.'
redirects:
- /docs/integrations/kubernetes-integration/troubleshooting/kubernetes-integration-troubleshooting-error-messages
- /docs/integrations/host-integrations/troubleshooting/kubernetes-integration-troubleshooting-error-messages
freshnessValidatedDate: 2024-09-02
---

It's possible that you may see error messages from your terminal during the installation of the Kubernetes integration, or when you check your New Relic infrastructure logs after the integration is installed.

These are the possible error messages you can see:

* [Error sending events](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/error-sending-events)
* [Failed to discover kube-state-metrics](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/failed-discover-kube)
* [Invalid New Relic license](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/invalid-nr-license)
* [Installation error due to Dockerhub and registry.k8s.io](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/installation-error-dockerhub-registry)
* [Pod is not starting](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/pod-not-starting)
* [Repo newrelic not found](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/repo-newrelic-not-found)
* [Unable to connect to the server](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/unable-connect-server)






Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: 'Error sending events'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if you receive an error when sending events.
freshnessValidatedDate: 2024-09-02
---

## Problem

The agent can't connect to the New Relic servers and you see an error like the following in the logs of the `agent` or `forwarder` containers:

```shell
2018-04-09T18:16:35.497195185Z time="2018-04-09T18:16:35Z" level=error
msg="metric sender can't process 1 times" error="Error sending events:
Post https://api.newrelic.com/metrics/events/bulk:
net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
```

## Solution

Depending on the exact nature of the error the message in the logs may differ. To address this problem, see the [New Relic networks documentation](/docs/new-relic-solutions/get-started/networks/#infrastructure) and the [Troubleshooting New Relic infrastructure agent networking issue](https://github.com/newrelic/infrastructure-agent/blob/master/docs/network_troubleshooting.md?) GitHub page.

Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: 'Failed to discover kube-state-metrics'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if kube-state-metrics is not found.
freshnessValidatedDate: 2024-09-02
---

## Problem

The Kubernetes integration requires `kube-state-metrics`. If this is missing, you'll see an error like the following in the `nrk8s-ksm` container logs:

```shell
time="2022-06-21T09:12:20Z" level=error msg="retrieving scraper data: retrieving ksm data: discovering KSM endpoints: timeout discovering endpoints"
```

## Solution

Check the following:

* `kube-state-metrics` has not been deployed into the cluster.
* `kube-state-metrics` is deployed using a custom deployment.
* There are multiple versions of `kube-state-metrics` running and the Kubernetes integration is not finding the correct one.

The Kubernetes integration automatically detects `kube-state-metrics` in your cluster, using by default the label `app.kubernetes.io/name=kube-state-metrics` across all namespaces.


<Callout variant="tip">
You can change the discovery behavior in the `ksm.config` of the [Helm chart](https://github.com/newrelic/nri-kubernetes/blob/main/charts/newrelic-infrastructure/values.yaml) values.
</Callout>
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: 'Installation error due to Dockerhub and registry.k8s.io'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if you have an installation error due to Dockerhub and registry.k8s.io.
freshnessValidatedDate: 2024-09-02
---

## Problem

You have a problem with the [New Relic dockerhub](https://hub.docker.com/u/newrelic) and Google's [`registry.k8s.io`](https://github.com/kubernetes/registry.k8s.io) during the installation.


## Solution

Check you've added their domains to your allow list. The installation pulls the container images from this location. You can [test connectivity to `registry.k8s.io`](https://kubernetes.io/blog/2023/03/10/image-registry-redirect/#how-can-i-check-if-i-am-impacted) to find the extra Google registry domains to add to your whitelist. `registry.k8s.io` usually redirects to your local registry domain. For example, `asia-northeast1-docker.pkg.dev` based on your region.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: 'Invalid New Relic license'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if the New Relic license is invalid.
freshnessValidatedDate: 2024-09-02
---

## Problem

You are getting this error in the logs of the `agent` or `forwarder` containers:

```shell
2018-04-09T14:20:17.750893186Z time="2018-04-09T14:20:17Z" level=error
msg="metric sender can't process 0 times" error="InventoryIngest: events
were not accepted: 401 401 Unauthorized Invalid license key."
```

## Solution

Make sure you're using a valid <InlinePopover type="licenseKey"/>.
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: 'Pod is not starting'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if the Pod is not starting.
freshnessValidatedDate: 2024-09-02
---

## Problem

You get the output error `nrk8s-kubelet pod is not starting` when you follow the guided installation.

## Solution

This error indicates that the Kubernetes kubelet pod can't be started within 5 minutes, and the installation script fails due to this timeout.

In this case, you can run this command to see the pod's status and restarts:

```bash
kubectl get pods -o wide -n newrelic | grep nrk8s-kubelet
```

Check the following:

* If the pod is in `ImagePullBackOff` status, please check your network connection to allow image pulling from the [right domains](/docs/new-relic-solutions/get-started/networks).


* If the pod is in `Pending` or `ContainerCreating` status, please run these commands to find out the possible reasons from the [debug logs](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/get-logs-version/#verbose-logging):

```bash
kubectl logs newrelic-bundle-nrk8s-kubelet-n newrelic
kubectl logs newrelic-bundle-nrk8s-kubelet-n newrelic -c kubelet
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: 'Repo newrelic not found'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if the newrelic repo is not found.
freshnessValidatedDate: 2024-09-02
---

## Problem

You see this error message during your [Kubernetes integration installation](/install/kubernetes/) with Helm or Manifest.

```shell
repo newrelic not found
```

## Solution

Add the newrelic repo to your helm chart by running this command:

```shell
helm repo add newrelic https://helm-charts.newrelic.com
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: 'Unable to connect to the server'
type: troubleshooting
tags:
- Integrations
- Kubernetes integration
- Troubleshooting
metaDescription: Some troubleshooting tips if you're having issues with the networking connection.
freshnessValidatedDate: 2024-09-02
---

## Problem

You get this output error when you're following the guided install.

```shell
Unable to connect to the server: dial tcp [7777:777:7777:7777:77::77]:443: i/o timeout
```

## Solution

This indicates that you're experiencing a network connection issue between the Kubernetes client and the Kubernetes API server. Make sure your Kubernetes client can connect to your Kubernetes API server before running the guided install again.


Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ freshnessValidatedDate: 2023-08-02

## Problem [#problem]

You have completed the [installation procedure](/install/kubernetes/) for New Relic's Kubernetes integration with Helm for the `nri-bundle`, but our Helm templates are not respecting some [global values](https://github.com/newrelic/helm-charts/tree/master/charts/nri-bundle#values) in your `values.yaml`.
You've installed the [New Relic's Kubernetes integration](/install/kubernetes/?dropdown1=helm) with Helm for the `nri-bundle`, but our Helm templates are not respecting some [global values](https://github.com/newrelic/helm-charts/tree/master/charts/nri-bundle#values) in your `values.yaml`.

## Solution [#solution]

Expand Down
Loading
Loading