Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designating Pages for more regular spot checks #114

Open
danielballan opened this issue Aug 28, 2018 · 6 comments
Open

Designating Pages for more regular spot checks #114

danielballan opened this issue Aug 28, 2018 · 6 comments

Comments

@danielballan
Copy link
Contributor

On the analyst call, it was noted that one page was dropping offline intermittently with unusual frequency, more than we usually see.

It might be useful to designate certain Pages for more regular spot checks (e.g. hourly). Maybe this is something to fit into the incipient work on a custom scraper.

@Mr0grog
Copy link
Member

Mr0grog commented Aug 30, 2018

This should just be an extension of edgi-govdata-archiving/web-monitoring-processing#172. What’s the status with that, @weatherpattern? Are you planning to come back to it, or does it need someone else to take it over?

@Mr0grog
Copy link
Member

Mr0grog commented Aug 30, 2018

Unless I’m misunderstanding what you’re getting at here, @danielballan.

@danielballan
Copy link
Contributor Author

I was too vague here; let me try again.

Short of building a full scraper (or in addition to building a full scraper) we might run a service that regularly polls the response code for a set of important Pages. It wouldn't do anything about the content. The goal is to get better time resolution on the frequency and duration of outages than we can get from Versionista or IA or likely any service that is pulling down and storing (or comparing) content.

@Mr0grog
Copy link
Member

Mr0grog commented Sep 13, 2018

Aaaahhhhhhhhhhhhhhh, makes sense.

@Mr0grog
Copy link
Member

Mr0grog commented Sep 13, 2018

Maybe treat it kinda like we do the IA healthcheck, and run it with cron and have it pick up a manifest of URLs to check from disk, then notify to Sentry if any return 400+ status codes (or don’t respond at all).

@stale
Copy link

stale bot commented Mar 12, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label Mar 12, 2019
@Mr0grog Mr0grog added the idea label Mar 14, 2019
@stale stale bot removed the stale label Mar 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants