Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Lightweight Static Website Generator

Summary
I describe a lightweight method for generating a static website, which I currently use (as of June 2013) for my personal site. The ideas in this article could inform the design of future static website generators.

As of June 2013, my personal website (www.pgbovine.net) contains almost 150 articles. In addition to writing those articles, I've also created an authoring workflow that allows me to:

  • create and edit articles as plain-text files on my computer (rather than writing on some clunky Web-based blogging interface),

  • compile those text files into static HTML that I can view locally (without setting up a web server),

  • maintain intricate control of the "look-and-feel" of my site rather than using pre-made blogging templates,

  • incorporate blog-like metadata such as creation/modification dates, article summaries, and category tags,

  • and easily push my site live to a low-cost static web hosting service.

I learned a bit about static site generators such as Jekyll but concluded that what I wanted to accomplish was simple enough to write by myself. The main advantage of "rolling my own" rather than using an existing tool is that I can more readily customize the code to my liking. Also, it could be hard to get an existing tool to generate output with the "look-and-feel" of my current website, so I found it easier to just write my own site generator.

Editing plain-text files

To start a new article, I simply create a new text file of the form raw-*.txt. For example, the raw text file for this article is called raw-lightweight-website-generator.txt (view its contents).

Each article starts with a header of metadata in YAML format. Here is the header for this article:

---
title: Lightweight Static Website Generator
tags:
  - computing
  - programming
created: 2013-06-15
modified: 2013-06-15
summary: >
  I describe a lightweight method for generating a static website, which
  I currently use (as of June 2013) for my personal site.
---

Note that the header defines metadata such as the article's title, summary, creation and modification dates, and blog-like tags. I prefer YAML over other structured formats such as XML or JSON because it's easy to write, easy to read, and easy for programs to parse (YAML parsers are available for many programming languages).

After the header comes the body of the article, which is written in Markdown, a popular lightweight text format that compiles into HTML. Writing my article in Markdown (rather than raw HTML) feels just like writing ordinary text with some occasional style markup.

In sum, the combination of YAML and Markdown allows me to keep article metadata and content together in a single file, which I can edit using any text editor (I mostly use Vim).

Compiling to HTML

I wrote a collection of Python scripts to compile raw text files (raw-*.txt) into HTML pages like the one you're currently viewing. In general, the scripts:

  1. Parse the YAML headers of all raw-*.txt files using PyYAML to build metadata for the entire website (e.g., how many and which articles are tagged with each category tag).

  2. Run Markdown to compile article contents into HTML.

  3. Wrap the article HTML in a header, footer, and right-hand sidebar.

  4. Use site-wide metadata (from YAML parse) to generate right-hand sidebar contents with sections such as "Newest articles" and "Categories".

  5. Generate "Related pages" listings at the footer, again using site-wide metadata.

  6. Update summary pages for each category, such as for pages tagged as teaching.

Finally, I wrap the Python scripts in a Makefile to enable more efficient incremental builds.

Note that all of this compilation happens locally on my computer without requiring any Internet access. Thus, I can edit my site on a long plane ride or while waiting at the doctor's office.

(Note: The main reason I haven't made these scripts public is that they're hard-coded for my own website. So unless you want your site to look exactly like mine, my scripts probably won't be useful for you.)

Live site push

Finally, I use Unison to push my entire site live to my web hosting service (WebFaction). Since my site is simply a collection of static files, I can use the ultra-efficient Nginx webserver.

I've raved about Unison for many years. In particular, using Unison here rather than ordinary rsync allows me to edit my site both locally and on the server (if I need to ssh in to make a quick fix) and keep the two copies synchronized. Also, rsync doesn't properly account for deleted files.

Running Unison to push my site live also provides a layer of "sanity checks," since it shows me exactly which files have been updated since the last sync. For instance, I recently tweaked my Python scripts to customize the HTML title only for guest articles and rebuilt my entire site. When I ran Unison, it showed me that, as expected, only the guest article HTML files have been updated since last sync:

changed  ----> after-high-school-guest-article.htm
changed  ----> asian-parents-backlash-guest-article.htm
changed  ----> asian-parents-communication-guest-article.htm
changed  ----> back-pain-guest-article.htm
changed  ----> overbearing-indian-father-guest-article.htm
changed  ----> overcoming-overbearing-parents-guest-article.htm
changed  ----> tiger-child-guest-article.htm
changed  ----> tiger-cub-guest-article.htm
changed  ----> undergrad-research-guest-article.htm

This provides a much-needed sanity check; if other HTML files had been affected by my change, then I would be alarmed.

Putting it all together

In sum, I've made it super-easy for me to create and edit website articles, which increases the chances that I will post more frequently.

Here's my current three-step workflow:

  1. create and edit raw-*.txt files in any text editor,

  2. run make to compile my entire website,

  3. and run Unison to push my updated site to WebFaction.

Postscript: Version Control

One more thing: I use Git as version control for all of my raw-*.txt raw text files. Read Lightweight File Versioning and Synchronization with Git and Unison for more details.

One great property of Git is that all of the version control metadata is cleanly kept within a .git/ sub-directory. Thus, my entire website is rooted in a single self-contained directory, which makes it (and its version history) trivial to back up with tar or zip.

Created: 2013-06-15
Last modified: 2013-06-20
Related pages tagged as software:
Related pages tagged as programming: