Static website backup

This is a simple script that scrapes the website using wget in order to maintain a static fallback. This script requires pretty permalinks unless used on a single page site.

The script runs three download steps:

Download all files on the website using wget, waiting 1 second between requests. The script ignores any file with a query parameter unless that parameter is ?ver.
Download the website's 404 page. This fails if the website has a page at 404.html for some reason.
Download any extra urls specified in extra-urls.txt.

The download is followed by three post-processing steps:

Remove all query parameters (only ?ver) from the downloaded files using .github/bin/cleanup-querystrings.py.
Use sed to replace the website's URL with the GitHub Pages url in all files.
Minify all HTML files using minify.

After these steps, the files are deployed to GitHub Pages: ftcunion.github.io.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static website backup

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Static website backup