Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-scraper: SQLite usage is not thread safe #9452

Open
foslage opened this issue Sep 10, 2024 · 0 comments
Open

Metrics-scraper: SQLite usage is not thread safe #9452

foslage opened this issue Sep 10, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@foslage
Copy link

foslage commented Sep 10, 2024

What happened?

When metrics-scraper runs UpdateDatabase() or CullDatabase() at the exact time a client is querying metrics it will receive a database is locked (5) (SQLITE_BUSY) error and abort.

This results in UpdateDatabase() or CullDatabase() not closing the transaction correctly. Any future attempts to open a (new) transaction will result in a SQL logic error: cannot start a transaction within a transaction (1) error.

This effectively renders metrics-scraper unusable and results in kubernetes-dashboard only showing outdated metrics or not showing metrics at all.

What did you expect to happen?

Expected it to handle multiple concurrent queries correctly.

How can we reproduce it (as minimally and precisely as possible)?

  1. Build metrics-scraper from current master (a12c809) and start it on a cluster with a few pods in it (my cluster had ~5 nodes and ~700 pods)
git clone https://github.com/kubernetes/dashboard.git kubernetes-dashboard
cd kubernetes-dashboard/modules/metrics-scraper
go build
# use metric resolution of 1 second to increase odds of error occuring
rm -f /tmp/metrics.db* && ./metrics-scraper --kubeconfig /path/to/kubeconfig --metric-resolution 1s
  1. Run bombardier or a similar tool to generate some metrics querying requests
git clone https://github.com/codesenberg/bombardier.git
cd bombardier
go build
./bombardier -c 1 -d 5s http://localhost:8000/api/v1/dashboard/namespaces/[path to some metric that actually exists]

Anything else we need to know?

A good solution would be to update database.go to close transactions correctly in case of errors.

A great solution would be to also make SQLite thread safe as describe in mattn/go-sqlite3 #209. This would allow clients to query metrics while UpdateDatabase() or CullDatabase() are active.

Until this bug is fixed a possible workaround is to use the --db-file parameter to get SQLite to use a shared cache. This can be done using the helmfile's values.yaml:

metricsScraper:
  containers:
    args:
      - --db-file
      - file:/tmp/metrics.db?cache=shared

What browsers are you seeing the problem on?

Chrome, Safari, Microsoft Edge, Firefox, Others

Kubernetes Dashboard version

7.5.0

Kubernetes version

v1.30.3

Dev environment

$ go version
go version go1.22.6 linux/amd64
$ node --version
v22.4.1

@foslage foslage added the kind/bug Categorizes issue or PR as related to a bug. label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant