Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing

research paper summary
Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing. Alok Mysore and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
Tutorials are vital for helping people perform complex software-based tasks in domains such as programming, data science, system administration, and computational research. However, it is tedious to create detailed step-by-step tutorials for tasks that span multiple interrelated GUI and command-line applications. To address this challenge, we created Torta, an end-to-end system that automatically generates step-by-step GUI and command-line app tutorials by demonstration, provides an editor to trim, organize, and add validation criteria to these tutorials, and provides a web-based viewer that can validate step-level progress and automatically run certain steps. The core technical insight that underpins Torta is that combining operating-system-wide activity tracing and screencast recording makes it easier to generate mixed-media (text+video) tutorials that span multiple GUI and command-line apps. An exploratory study on 10 computer science teaching assistants (TAs) found that they all preferred the experience and results of using Torta to record programming and sysadmin tutorials relevant to classes they teach rather than manually writing tutorials. A follow-up study on 6 students found that they all preferred following the Torta tutorials created by those TAs over the manually-written versions.
@inproceedings{MysoreUIST2017,
 author = {Mysore, Alok and Guo, Philip J.},
 title = {Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}

Complex software tasks often require intricate coordination across multiple GUI and command-line tools. For instance, if you want to start building a modern full-stack web app, you may need to first install Node.js and the npm package manager, run a slew of npm commands to configure a custom toolchain with a CSS preprocessor and a JavaScript code bundler, adjust OS environment variables to detect all required library dependencies and execution paths, customize your IDE to hook up to that toolchain, install and configure web browser extensions for debugging, and set up a pipeline to deploy code to production servers. All these acrobatic contortions must be done before you can even write a simple “hello world” web app! (This example was made in 2017. I'm sure that details will drastically change in the future, but the underlying complexity will undoubtedly remain.)

The same kinds of arcane command-line (and GUI) BS afflict data scientists, computational researchers, system administrators, and anyone else who has to work with computers in a non-trivial way.

To help novices learn to do these tasks, experts create step-by-step tutorials in one of two ways:

  • Hand-written: They can create a written tutorial by painstakingly enumerating all steps, describing shell commands, expected outputs, and side effects, and taking and annotating screenshots to demonstrate GUI-based actions. This process is very tedious and time-consuming for the creator (e.g., it's easy to forget or gloss over certain steps!) but can lead to high-quality tutorials for learners if done well. However, it has the drawback of not capturing motion-based actions that are helpful for GUI apps. Which brings us to ...

  • Screencast videos: They can create a video tutorial by simply demonstrating the requisite actions on their computer while narrating by voice. This has the advantage of being much easier for creators and also capturing motion-based actions. But videos are much harder for learners to navigate and search, and learners can't copy-and-paste shell commands and filesystem metadata from video clips like they can with hand-written tutorials. Videos are also much harder to edit later.

What if we could combine the best of both worlds? To try to do so, we created a macOS app called Torta (Transparent Operating-system Recording for Tutorial Acquisition) that makes it easy to create and consume mixed-media tutorials that contain the best properties of hand-written and screencast video formats. Here's how Torta works (click image to enlarge):

  1. The tutorial creator first demonstrates the intended actions on their computer by running shell commands, launching GUI applications, and interacting with application windows just like they would normally do. Torta automatically records a screencast video of their desktop along with a timestamped trace of OS-level activity that includes filesystem modifications, shell commands, window positions, and keystrokes. From this single demonstration, Torta generates a mixed-media tutorial that hierarchically segments the screencast video by foreground GUI windows, executed commands, and versions of saved files. It displays each segment as an individual step on a tutorial webpage.

  2. However, since this initial demonstration likely contains redundancies or errors due to the difficulty of recording a pristine and error-free video demo in one take, Torta provides a user interface for editing tutorials prior to publishing. The tutorial editor UI uses data from both the recorded screencast video and OS-level activity traces to allow creators to compress and summarize portions of the tutorial, add textual annotations, insert file path templates that generalize the tutorial's contents across machines, and add checkpoints for viewers to validate their progress.

  3. Torta-generated tutorials (“Tortorials”) are simply ordinary webpages that mix text and video, so people can consume them just like any web-based tutorial. Tortorials are also hierarchical, so users can zoom in to view more details on demand. If someone wants interactive feedback as they are following along, they can optionally install a Torta viewer app on their computer. Doing so enables them to use an augmented tutorial viewer that provides checkpoints to validate their progress at each step. The viewer app can also automatically run certain steps for the user.

This schematic shows the step-by-step structure of a hypothetical Tortorial demonstrating a sequence of web browser, terminal, and text editor actions. The sub-steps within each step represent shell command invocations and file save events:

In sum, Torta points toward a future where making complex software tutorials becomes as simple as interacting normally with the desired applications and adding some annotations afterward. For tutorial creators, Torta provides the best of both modalities—the fluid ease of demonstrating a set of computer actions in-situ, and the detailed rigor of writing text-based tutorials. And for tutorial consumers, Torta allows them to browse hierarchically at the level of detail suitable for their needs and to get step-by-step feedback on their incremental progress.


Read the full paper for details:

Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing. Alok Mysore and Philip J. Guo. ACM Symposium on User Interface Software and Technology (UIST), 2017.
Tutorials are vital for helping people perform complex software-based tasks in domains such as programming, data science, system administration, and computational research. However, it is tedious to create detailed step-by-step tutorials for tasks that span multiple interrelated GUI and command-line applications. To address this challenge, we created Torta, an end-to-end system that automatically generates step-by-step GUI and command-line app tutorials by demonstration, provides an editor to trim, organize, and add validation criteria to these tutorials, and provides a web-based viewer that can validate step-level progress and automatically run certain steps. The core technical insight that underpins Torta is that combining operating-system-wide activity tracing and screencast recording makes it easier to generate mixed-media (text+video) tutorials that span multiple GUI and command-line apps. An exploratory study on 10 computer science teaching assistants (TAs) found that they all preferred the experience and results of using Torta to record programming and sysadmin tutorials relevant to classes they teach rather than manually writing tutorials. A follow-up study on 6 students found that they all preferred following the Torta tutorials created by those TAs over the manually-written versions.
@inproceedings{MysoreUIST2017,
 author = {Mysore, Alok and Guo, Philip J.},
 title = {Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing},
 booktitle = {Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology},
 series = {UIST '17},
 year = {2017},
 publisher = {ACM},
 address = {New York, NY, USA},
}
Created: 2017-10-03
Last modified: 2017-10-03
Related pages tagged as human-computer interaction:
Related pages tagged as software: