Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki on kubernetes as datasource timeout #3133

Open
fgionghi opened this issue Jul 16, 2024 · 2 comments
Open

Loki on kubernetes as datasource timeout #3133

fgionghi opened this issue Jul 16, 2024 · 2 comments
Labels

Comments

@fgionghi
Copy link

fgionghi commented Jul 16, 2024

What happened?

This is not actually a bug, but due to how Loki operates on Kubernetes, this problem took me several hours to debug. I would like to share my solution to help others avoid the same issue.

As I discovered the first thing CrowdSec does when connecting to Loki is to check if it’s ready via its /ready endpoint.
Loki on Kubernetes does not expose the /ready readinessProbe outside the cluster, causing CrowdSec to fail when trying to reach Loki, even though Loki is actually functioning correctly.

The /ready endpoint is not exposed because, by default, Loki on Kubernetes uses loki-gateway, an NGINX server that handles API requests on /loki/api/v1 and basically suppress anything else. In fact directly accessing Loki's pod works for CrowdSec.

To resolve this issue while using the official Helm chart, add the following snippet:

gateway:
  nginxConfig:
    serverSnippet: |-
      location = /ready {
          proxy_pass       http://loki.monitoring.svc.cluster.local:3100$request_uri;
        }

What did you expect to happen?

Since loki was working, I expected crowdsec to be able to reach it without problems.
I discovered the issue and that CrowdSec tries /ready before anything else by checking the NGINX logs. It would be helpful if CrowdSec provided more information about this in its logs.

How can we reproduce it (as minimally and precisely as possible)?

Try to reach a loki instance hosted on kubernetes via an ingress

Anything else we need to know?

No response

Crowdsec version

$ cscli version
version: v1.6.2-debian-pragmatic-amd64-16bfab86
Codename: alphaga
BuildDate: 2024-05-31_09:18:01
GoVersion: 1.22.2
Platform: linux
libre2: C++
User-Agent: crowdsec/v1.6.2-debian-pragmatic-amd64-16bfab86-linux
Constraint_parser: >= 1.0, <= 3.0
Constraint_scenario: >= 1.0, <= 3.0
Constraint_api: v1
Constraint_acquis: >= 1.0, < 2.0

OS version

No response

Enabled collections and parsers

No response

Acquisition config

```console $ cat /etc/crowdsec/acquis.yaml /etc/crowdsec/acquis.yaml --- source: loki log_level: info url: https://loki.mydomain limit: 1000 query: | {host="reverse-proxy"} auth: username: x password: y labels: type: gelf-nginx --- source: loki log_level: debug url: https://loki.mydomain limit: 1000 query: | {unit="ssh.service", instance="ssh-jump"} | json | line_format `{{.SYSLOG_TIMESTAMP}}{{._HOSTNAME}} {{.SYS LOG_IDENTIFIER}}[{{._PID}}]: {{.MESSAGE}}` auth: username: x password: y labels: type: syslog

Config show

No response

Prometheus metrics

No response

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

No response

@fgionghi fgionghi added the kind/bug Something isn't working label Jul 16, 2024
Copy link

@fgionghi: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@LaurenceJJones
Copy link
Contributor

Linking #2828 for this comment

I discovered the issue and that CrowdSec tries /ready before anything else by checking the NGINX logs. It would be helpful if CrowdSec provided more information about this in its logs.

Maybe a configuration option to disable /ready endpoint check but this is only tailored for k8s ingress. Let us dwell on this, but thank you for your report and guide how others can overcome this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants