Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple runs for big tables #43

Open
wenq1 opened this issue Jan 10, 2022 · 3 comments
Open

Multiple runs for big tables #43

wenq1 opened this issue Jan 10, 2022 · 3 comments

Comments

@wenq1
Copy link

wenq1 commented Jan 10, 2022

For large tables (~30GB), is it possible to run pgcompacttable till say 10%, do a manual VACUUM and REINDEX, and then run pgcompacttable again which will just do the remaining 90% of the work?

@dk98126
Copy link

dk98126 commented May 13, 2023

@Melkij
Hello! Could you please answer this question?

What happens if I stop the script? Does it go all over again and processes every single page or does it somehow knows last processed page and starts from this page?

In other words, how can I run script multiple times? In my production environment I can't just leave it running because it would run for 5 days.

@alexius2
Copy link
Member

Hello, if you stop pgcompacttable and run it again - it will start from the beginning: run vacuum, check bloat and then will try to move data from tail pages if there is enough bloat left.
During vacuum phase it's possible that it will remove tail pages that was cleaned up in previous run so it won't have to process them again. Also pgcompacttable runs vacuum periodically after certain amount of processed pages.

If other transactions prevent vacuum from acquiring lock and remove pages - pgcompacttable will process these pages again, but it will be faster because they should already be empty.

So in some cases - yes, it's ok to run/stop it multiple times to process big table, eventually it should finish. If table is not so big but pgcompacttable runs slowly - perhaps there are many bloated indexes which better be compacted before data movement with --initial-reindex option. Also there is --delay-ratio option which could speed up process (if there is enough resources).

pgcompacttable won't clean up for itself if stopped and will leave it's function in database (shouldn't affect anything), which may be dropped using this query:

select 'drop function ' || proname || '(text, text, integer, integer, integer);' from pg_proc where proname ~'pgcompact_clean_pages';

(it will return query/queries to run for removing pgcompacttable functions).

@dk98126
Copy link

dk98126 commented May 18, 2023

@alexius2
Thanks for the detailed answer!

If table is not so big but pgcompacttable runs slowly - perhaps there are many bloated indexes which better be compacted before data movement with --initial-reindex option.

Based on your experience, is adding --initial-reindex option also good not just for small, but big tables?
Our production tables is 190GB and it's biggest index is about 70GB (total indexes size is 483GB)

I'm interested in how this option will affect performance. Now I know that pgcompacttable doesn't really affect our database performance. I'm just a little scared to turn --initial-reindex option...

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants