Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Heroku connection speed #210

Open
hancush opened this issue Aug 26, 2024 · 4 comments
Open

Investigate Heroku connection speed #210

hancush opened this issue Aug 26, 2024 · 4 comments
Assignees

Comments

@hancush
Copy link
Member

hancush commented Aug 26, 2024

Imports periodically time out due to very slow connections to Heroku. Heroku Postgres databases are colocated with many other project databases on a Postgres server. Email support, and if this issue cannot be addressed, consider migrating database to RDS.

@hancush hancush self-assigned this Aug 26, 2024
@datamade datamade deleted a comment Aug 26, 2024
@hancush
Copy link
Member Author

hancush commented Sep 9, 2024

Heroku support claims no issues on the Postgres side. Opened a ticket with GitHub: https://support.github.com/ticket/personal/0/2990677

Another option to consider is making a larger runner available for imports, though the cost is higher: https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates

@hancush
Copy link
Member Author

hancush commented Sep 10, 2024

Quick response from GitHub support. tl;dr - Resources (and many other things) can indeed vary between runs:


Hi hannah, 

Thank you for reaching out to GitHub! Yes, the available computing resources can indeed vary between runners on GitHub Actions, which could explain the variability in job performance you're seeing. Here are a few key factors that might affect the speed of your GitHub Actions jobs: 

  • GitHub-hosted runners are virtual machines that run on shared infrastructure. This means that resource allocation (CPU, memory, disk I/O, etc.) can vary depending on the workload of the underlying hosts and how busy the GitHub Actions service is at the time of your job.
  • If a runner happens to be on a more heavily loaded host, your job might experience reduced performance.
  • Network conditions between the GitHub Actions runner and your Heroku Postgres database can vary significantly. A job running on a runner that is geographically farther from Heroku's servers or experiencing network congestion could suffer from increased latency or reduced bandwidth, leading to slower data import speeds.
  • Even if Heroku claims no ongoing issues, performance can still vary based on the load on your specific database instance or underlying hardware. For instance, if there are many concurrent queries or operations, it could cause slower response times for your import jobs.
  • The rate at which your import job executes could be influenced by limits on CPU or memory usage, or throttling mechanisms both on GitHub Actions and Heroku. For instance, if your job exceeds certain usage limits, it could face rate limiting that impacts its throughput.
  • Changes in the GitHub Actions runner environment or available software dependencies might also introduce variability. Different versions of tools or libraries could impact performance, although this would likely be less pronounced unless there are specific optimizations or regressions. 

I hope this helps! Please let us know if you have any questions, or if we can help with anything else at the moment.

 Cheers,
James

@fgregg
Copy link
Member

fgregg commented Sep 10, 2024

okay, so a few thoughts to explore:

  1. we try the more powerful runners. if we did the cheapest step-up, that could be around $100 / month (assuming 3 hours per year-import)
  2. we could do a self-hosted runner option, which we've done before and likely be cheaper than the beefier, native github runners
  3. within github actions, we could detect that we are a slow environment, and restart the action. https://github.com/orgs/community/discussions/67654#discussioncomment-8038649
  4. we could split the import job into smaller chunks. right now we are splitting them into year chunks, but we could split them into 6-month or 3-month chunks
  5. we could rewrite the import so there is less over-the-network communication (more batch). the import code used to be batchier, but led to memory problems when we were running the import on a heroku instance. in a github action we could make a different memory/time tradeoff.

@hancush
Copy link
Member Author

hancush commented Sep 10, 2024

@fgregg I definitely think a batchier job would be the most cost effective and least complex in the long term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants