-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add recursive parameter to allow crawling recursively (#26)
Crawl up to a depth of `r` by choosing a link at random from the page Update README
- Loading branch information
1 parent
b29b8d6
commit a917a11
Showing
5 changed files
with
79 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,9 +9,15 @@ Usage | |
--- | ||
|
||
``` | ||
usage: run.js [-h] [-v] -b BINARY -o OUTPUT -u URL [URL ...] | ||
[-e EXISTING_PROFILE] [-p PERSIST_PROFILE] [-s {up,down}] | ||
[-t SECS] [--debug {none,debug,verbose}] | ||
$ npm run crawl -- -h | ||
> [email protected] crawl | ||
> node ./built/run.js | ||
usage: run.js [-h] [-v] -b BINARY [-r RECURSIVE_DEPTH] -o OUTPUT -u URL | ||
[URL ...] [-e EXISTING_PROFILE] [-p PERSIST_PROFILE] | ||
[-s {up,down}] [-t SECS] [--debug {none,debug,verbose}] [-i] | ||
[-a USER_AGENT] [--proxy-server URL] [-x JSON_ARRAY] | ||
CLI tool for crawling and recording websites with PageGraph | ||
|
@@ -20,9 +26,13 @@ Optional arguments: | |
-h, --help Show this help message and exit. | ||
-v, --version Show program's version number and exit. | ||
-b BINARY, --binary BINARY | ||
Path to the PageGraph-enabled build of Brave. | ||
Path to the PageGraph enabled build of Brave. | ||
-r RECURSIVE_DEPTH, --recursive-depth RECURSIVE_DEPTH | ||
If provided, choose a link at random on page and do | ||
another crawl to this depth. Default: 1 (no | ||
recursion). | ||
-o OUTPUT, --output OUTPUT | ||
Path to write graphs to. | ||
Path (directory) to write graphs to. | ||
-u URL [URL ...], --url URL [URL ...] | ||
The URLs(s) to record, in desired order (currently | ||
only crawls the first URL) | ||
|
@@ -38,4 +48,12 @@ Optional arguments: | |
-t SECS, --secs SECS The dwell time in seconds. Defaults: 30 sec. | ||
--debug {none,debug,verbose} | ||
Print debugging information. Default: none. | ||
-i, --interactive Suppress use of Xvfb to allow interaction with | ||
spawned browser instance | ||
-a USER_AGENT, --user-agent USER_AGENT | ||
Override the browser's UserAgent string to USER_AGENT | ||
--proxy-server URL Use an HTTP/SOCKS proxy at URL for all navigations | ||
-x JSON_ARRAY, --extra-args JSON_ARRAY | ||
Pass JSON_ARRAY as extra CLI argument to the browser | ||
instance launched | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters