Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bouncer metrics are adding significant load on OpenWrt router #377

Closed
ne20002 opened this issue Jul 21, 2024 · 4 comments
Closed

bouncer metrics are adding significant load on OpenWrt router #377

ne20002 opened this issue Jul 21, 2024 · 4 comments

Comments

@ne20002
Copy link

ne20002 commented Jul 21, 2024

What happened?

When updating the bouncer to version 0.0.29-rc3 on my OpenWrt router (BananaPi R3) I realized that the bouncer is adding a significant load on the router when metrics are enabled. Usually the router has a very low load, it has a powerful processor.
The load added is by 10 - 20% on all 4 cpus only due to the metrics. This on a device usually at under 4%.

What did you expect to happen?

I'd like to not add additional load of this amount to the device when collecting metrics.

How can we reproduce it (as minimally and precisely as possible)?

Install OpenWrt on BPi R3 with luci-app-crowdsec-bouncer and enable metrics.

Anything else we need to know?

Most routers are devices with limited power but metrics are important. Before enabling the metrics in the bouncer the firewall rules already counted blocked packets and bytes. Thus it seems clear that the load must be caused by another value or the process of counting metrics itself.

I suggest to provide a setting for a limited number of metrics which should only include the number of dropped packets and bytes. None of the go internals is really necessary to know on normal operations.

The number of banned ips should be available on the lapi already. I don't know how this number is calculated and if it may be the cause of the load (counting the ips in the set?) but this should be tested and if the number of elements in the sets is not causing the load it may also be added to the limited set of metrics.

Also: it seems as if the metrics are continuously collected.

Maybe something like this may be a solution:

  • disabling metrics disables the continuously collecting of metrics (as it is now)
  • provide the /metrics endpoint nevertheless
  • if /metrics is called, collect the metrics on-demand (maybe only limited set available if disabled) or/and
  • add an optional parameter to the /metrics endpoint defining the set of metrics to collect (limited, full)

This would enable prometheus to collect only the needed metrics and would prevent unecessary load on the device. By chosing the interval on the callers side the load can also be reduced.

version

remediation component version:

$ crowdsec-firewall-bouncer --version
0.0.29-rc3

crowdsec version

crowdsec version:

$ crowdsec --version
1.6.4

OS version

# On Linux:
$ cat /etc/os-release
OpenWrt 23.05.4
$ uname -a
Linux BPI-R3-eth1 5.15.162 #0 SMP Mon Jul 15 22:14:18 2024 aarch64 GNU/Linux
Copy link

@ne20002: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Documentation to see if your issue can be self resolved.
  2. You can also join our Discord
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@ne20002 ne20002 changed the title bouncer metrics are adding reasonable load on OpenWrt router bouncer metrics are adding significant load on OpenWrt router Jul 21, 2024
@blotus
Copy link
Member

blotus commented Jul 22, 2024

Hello,

We are reworking the way we collect metrics / add more metrics in this PR: #365 (the goal is to provide those metrics as well in cscli metrics, and have more granular data about which decision source blocks what).

We have optimized the way we collect metrics for both nftables (no more calls to the nft binary) and iptables (a single call to iptables-save), would you mind trying the PR to see if you still see such an impact ? ()

@ne20002
Copy link
Author

ne20002 commented Jul 22, 2024

Hi @blotus
Thank you for the response. Unfortunately I don't have a go dev environment runnig nor any experience with go.
But I see that you already did add a 'only compute metrics when requested' commit to the PR.

@blotus
Copy link
Member

blotus commented Sep 17, 2024

Hello,

I've just merged #365, and it should be released this week.

There are some significant changes in how we collect metrics:

  • Reduced ticker usage: metrics are now only collected when prometheus makes a request to the endpoint and every ~20 minutes so we can send some statistics for crowdsec to display in cscli and the web console.
  • In iptables (or ipset) mode, the counter values are now fetched with iptables-save, which is faster and easier to parse
  • In nftables mode, we now use embedded counters in the rules we add, which remove the need to call the nft binary, and is much faster.
  • Because of those changes, metrics are always collected at least once every 20 minutes even if the prometheus endpoint is disabled (from our testing, it takes about 100-200ms to collect the metrics with more than 100k banned IPs, this will of course change depending on the CPU, but this should be almost invisible).

I'm going to close the issue, but feel free to reopen it if you still see huge CPU usage after the release.

@blotus blotus closed this as completed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants