-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Auto-vacuum Starvation #59
Comments
Oh hey @mikehale 👋 Thanks for raising this concern. Out of curiosity, how frequently are you manually vacuuming, how long does it take to complete, and are there any related settings you would want to share? We have an internal maintenance process architecture where we could fairly trivially add a feature to have the leader initiate a vacuum on a specific schedule but would need to be careful to provide the right configs for it and probably not enable by default. |
When I left we were vacuuming once every 5 minutes, and IIRC it took less than a second to complete. We had tuned the auto vacuum settings, but obviously those don't come into play with a manual vacuum. Our main concern was about affecting the performance of the rest of the database, through either a lock or increased I/O load. In practice neither of those manifested as issues, and I believe that is partly because of the consistently low overhead due to the regular and frequent vacumming. The trade-off of potentially higher I/O every 5 minutes vs randomly not having auto-vacuum run for many hours turned out to be a good one for us. |
We got bit by this. Was pretty frustrating to debug. |
@dyeje can you add any context around what you were seeing, what your workload is like, jobs/sec or day, etc? |
Postgres major version too please. I would've also expected that many of the B-tree optimizations would've helped slow down degenerate table bloat compared to those old days at Heroku. |
It was admittedly an extreme case, but I think it had been over 24 hours that the auto vacuum processes were working on very large tables, that coupled with a very thrashed jobs table caused a noticeable though not fatal degradation in job acquisition performance. |
I think we implemented this after your time at Heroku. We finally realized that sometimes the 2 PG auto vacuum processes would both end up working on tables that were taking hours to complete. That kept them from vacuuming the que_jobs table, which would lead to poor performance. We ended up adding a clock process task to manually vacuum our high churn tables at regular intervals, and we haven't seen poor job selection performance since.
It might be worthwhile to add something similar to what que has to your docs to mitigate this potential issue for river users.
The text was updated successfully, but these errors were encountered: