Stop/resume #2

f1ames · 2014-11-29T14:38:58Z

I think I saw it in the roadmap.
It could be nice if you could stop and then resume roboto so i does not start over from the beginning/startsUrl. I think it could be achieve via de/serialization so when you start/stop it loads its' previous state.

jculvey · 2014-11-29T23:23:41Z

Yeah, this is really lacking right now.

I've been a little torn over how to implement this. In the long term I think it would be cool if there was some sort of admin UI where you could view previous crawl results, start and stop new crawls, and maybe even do a little configuration.

That might be a little heavyweight for some people, so having a simple pause/resume from the command line would be nice.

How would this change sound:

In the crawler you can configure a queue file:

var crawler = new roboto.Crawler({
  startUrls: [
    "https://news.ycombinator.com/",
  ],  
  queueFile: '/var/foo'
});

Then, the url frontier and set of seen urls will periodically be serialized and flushed out to the file as json.

f1ames · 2014-11-30T19:15:50Z

Well, I have very similar idea. You can configure queue file and crawler periodically serializes data which is necessary for resume.
The flow I was thinking of:

if you don't define queueFile it works like current version
if you define queueFile it checks if it exists and if it's empty
- if it's empty, crawler starts from the beginning
- if it's not empty, crawler deserializes data and starts from this point

If crawler is done, it removes queueFile so next time it starts from the beginning.

WEBCLI-824 Add caching support in Devcenter crawler

nsakovich pushed a commit to nsakovich/roboto that referenced this issue Dec 24, 2015

Merge pull request jculvey#2 from nsakovich/feature/WEBCLI-824

7f9663c

WEBCLI-824 Add caching support in Devcenter crawler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop/resume #2

Stop/resume #2

f1ames commented Nov 29, 2014

jculvey commented Nov 29, 2014

f1ames commented Nov 30, 2014

Stop/resume #2

Stop/resume #2

Comments

f1ames commented Nov 29, 2014

jculvey commented Nov 29, 2014

f1ames commented Nov 30, 2014