Document Auto-vacuum Starvation #59

mikehale · 2023-11-21T17:43:21Z

I think we implemented this after your time at Heroku. We finally realized that sometimes the 2 PG auto vacuum processes would both end up working on tables that were taking hours to complete. That kept them from vacuuming the que_jobs table, which would lead to poor performance. We ended up adding a clock process task to manually vacuum our high churn tables at regular intervals, and we haven't seen poor job selection performance since.

It might be worthwhile to add something similar to what que has to your docs to mitigate this potential issue for river users.

bgentry · 2023-11-21T18:53:12Z

Oh hey @mikehale 👋 Thanks for raising this concern. Out of curiosity, how frequently are you manually vacuuming, how long does it take to complete, and are there any related settings you would want to share?

We have an internal maintenance process architecture where we could fairly trivially add a feature to have the leader initiate a vacuum on a specific schedule but would need to be careful to provide the right configs for it and probably not enable by default.

mikehale · 2023-11-21T19:21:22Z

When I left we were vacuuming once every 5 minutes, and IIRC it took less than a second to complete. We had tuned the auto vacuum settings, but obviously those don't come into play with a manual vacuum. Our main concern was about affecting the performance of the rest of the database, through either a lock or increased I/O load. In practice neither of those manifested as issues, and I believe that is partly because of the consistently low overhead due to the regular and frequent vacumming. The trade-off of potentially higher I/O every 5 minutes vs randomly not having auto-vacuum run for many hours turned out to be a good one for us.

dyeje · 2024-05-10T16:32:24Z

We got bit by this. Was pretty frustrating to debug.

bgentry · 2024-05-10T17:10:15Z

@dyeje can you add any context around what you were seeing, what your workload is like, jobs/sec or day, etc?

brandur · 2024-05-11T05:08:34Z

Postgres major version too please. I would've also expected that many of the B-tree optimizations would've helped slow down degenerate table bloat compared to those old days at Heroku.

mikehale · 2024-05-11T05:58:45Z

It was admittedly an extreme case, but I think it had been over 24 hours that the auto vacuum processes were working on very large tables, that coupled with a very thrashed jobs table caused a noticeable though not fatal degradation in job acquisition performance.

mikehale changed the title ~~Autovacuum Starvation~~ Document Auto-vacuum Starvation Nov 21, 2023

bgentry added the documentation Improvements or additions to documentation label Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Auto-vacuum Starvation #59

Document Auto-vacuum Starvation #59

mikehale commented Nov 21, 2023

bgentry commented Nov 21, 2023 •

edited

Loading

mikehale commented Nov 21, 2023

dyeje commented May 10, 2024

bgentry commented May 10, 2024

brandur commented May 11, 2024

mikehale commented May 11, 2024

Document Auto-vacuum Starvation #59

Document Auto-vacuum Starvation #59

Comments

mikehale commented Nov 21, 2023

bgentry commented Nov 21, 2023 • edited Loading

mikehale commented Nov 21, 2023

dyeje commented May 10, 2024

bgentry commented May 10, 2024

brandur commented May 11, 2024

mikehale commented May 11, 2024

bgentry commented Nov 21, 2023 •

edited

Loading