Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of Downloaders to add incremental data file update #60

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ziver
Copy link

@Ziver Ziver commented May 10, 2020

This is a bit of a refactoring to the downloader to change it to a Iterator like model where the data is incrementally written to the output file. The reason for this is that on larger repos it can take hours to download all the commits and if something fails or you need to abort the downloader you would previously loose all the already downloaded data, this should now be better as the last downloaded chunk should already have been written to the output file.

I have not checked but this should probably also improve memory usage as all commits do not need to be in memory until the downloader is done downloading.

I also removed the extra layer inside Ssh downloader and made the Legacy downloader just another independent downloader class, this was just to simplify the structure, was a bit hard to keep track of the layers when coming in to the repo.

My main goal with this is to add a diff-only option to the downloader so that the downloader only downloads new commits from Gerrit so the existing output file can be incrementally updated periodically or that you can continue downloading after a failed download without the need to re-download all commits again.

@holmari
Copy link
Owner

holmari commented Feb 15, 2021

@Ziver I'm really sorry for never getting back to you. I love Gerrit but unfortunately I don't use it at work anymore, and so I have not been able to maintain this tool. I really appreciate your changes here but I can't merge them in since I can't actively maintain this tool. I updated the README with a note regarding a rewrite of this tool, which I pushed to GitHub yesterday.

@Ziver, in case you're interested, the Gerrit implementation of data fetching + analysis computation is up for grabs and shouldn't be tons of work - the GitHub data download code is about 400 lines, and the GitHub-specific analysis is about 600. Let me know, I'd be happy to walk you through the code in case you were curious to pick it up :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants