Template & guide for repotting sites as static copies with Httrack and some Ruby tools on Mac.
Inspired by Repotting Old Digital Humanities Projects:
Two Test Cases by Matt Miller.
- Homebrew
- Httrack (install with
$ brew install httrack
) - Html-Tidy (install with
$ brew install tidy-html5
) - Ruby >= 2.4 with bundler
- Create a new repo from this template. The new repo name should be "<projectname>-repotted"
- Clone the repo to your local machine, cd into it, then run
$ bundle install
- Copy the site using httrack. the command is
$ httrack --verbose --clean <url> -O docs
, where<url>
is the full, publicly accessible URL to the website you want to copy../docs
is where the copied site will go. (Note: do not add trailing slash in URL)
- Httrack unfortunately adds extra files and unnecessary hierarchies. To clean it up, open the newly generated
docs
directory and deletehts-cache
,blackblue.gif
,fade.gif
, andindex.html
. - Depending on the structure of your "old pot" URL, you'll have a series of hierarchical folders mirroring the structure of the URL. (e.g.,
www
,nyu.edu
,projects
, etc.) Find the lowermost folder with all the site files and copy them directly into./src
. Then delete the empty directories left over.
- Run
$ ruby lib/check-links.rb src
. This will flag any broken links within the site (not externally). Totally up to you what to do with this information / whether or not to fix it!
- Run
$ ruby lib/check-html.rb src
. This will flag any HTML errors within the copied site. If there are no major errors, feel free to skip to Step 7.
- Run
$ ruby lib/tidy-html.rb src
. This will attempt to automatically "tidy" some HTML errors. You can check the HTML again using$ ruby lib/check-html.rb src
to see if tidying worked. Again, totally up to you what to do with this information / whether or not to fix it!
- When you're done copying and tidying, rename this
README.md
file toinstructions.md
- Next, fill out the
README-template.md
and rename it toREADME.md
- Add, commit, and push your changes (including the site in
docs
) - Test the site using github pages by going to
settings
>github pages
and setting the source tomain
>docs
. - After a minute, go to the live github pages site to test it.
- If everything is good, copy the contents of
docs
into your "new pot" host, e.g., an NYU web hosting account. - Et voilà!