Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Codemotion: Expanding the Design Space of Learner Interactions with Computer Programming Tutorial Videos

research paper summary
Codemotion: Expanding the Design Space of Learner Interactions with Computer Programming Tutorial Videos. Kandarp Khandwala and Philip J. Guo. ACM Conference on Learning at Scale, 2018.
Love them or hate them, videos are a pervasive format for delivering online education at scale. They are especially popular for computer programming tutorials since videos convey expert narration alongside the dynamic effects of editing and running code. However, these screencast videos simply consist of raw pixels, so there is no way to interact with the code embedded inside of them. To expand the design space of learner interactions with programming videos, we developed Codemotion, a computer vision algorithm that automatically extracts source code and dynamic edits from existing videos. Codemotion segments a video into regions that likely contain code, performs OCR on those segments, recognizes source code, and merges together related code edits into contiguous intervals. We used Codemotion to build a novel video player and then elicited interaction design ideas from potential users by running an elicitation study with 10 students followed by four participatory design workshops with 12 additional students. Participants collectively generated ideas for 28 kinds of interactions such as inline code editing, code-based skimming, pop-up video search, and in-video coding exercises.
@inproceedings{KhandwalaLAS2018,
 author = {Khandwala, Kandarp and Guo, Philip J.},
 title = {Codemotion: Expanding the Design Space of Learner Interactions with Computer Programming Tutorial Videos},
 booktitle = {Proceedings of the Fifth Annual ACM Conference on Learning at Scale},
 series = {L@S '18},
 year = {2018},
 isbn = {978-1-4503-5886-6},
 location = {London, United Kingdom},
 pages = {57:1--57:10},
 articleno = {57},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/3231644.3231652},
 doi = {10.1145/3231644.3231652},
 acmid = {3231652},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {computer programming, screencasts, tutorial videos},
}

Screencast videos are now a really popular format for teaching computer programming since they can give learners a chance to virtually look over an expert's shoulder as they write, run, and debug code. For example, this lecture video from Harvard's CS50 introductory course shows the instructor live coding in both C and Python in his IDE and narrating his thought processes aloud:

Zillions of these videos now exist within YouTube, livestreams, MOOCs, and coding tutorial websites. These can get really long, especially if the instructor is demonstrating how to build a realistic app from scratch with all the nuances of nitty-gritty coding details; the Harvard video above is over 2 hours long. Since videos simply consist of pixels with no semantic information about the underlying code, it's hard for learners to navigate through them or to find which parts are most relevant to their learning goals. What better video player interfaces could we build if we could automatically extract the code present within these videos?

To play with this idea, we created Codemotion, a computer vision algorithm that takes existing videos and automatically extracts the source code within them and how that code morphed over time (the motion of code!!!). Our algorithm handles visual features commonly found in screencast videos such as multiple editor panes in split views (like the C and Python editors in the above Harvard video), GUI windows moving around and getting resized, and differing color schemes. It essentially reverse-engineers the video to reconstruct the instructor's exact code edits over time, to the extent that it's visible on-screen.

Our first prototype interface

What can we do with the data that Codemotion gives us? For starters, we built a prototype mixed-media (text+video) tutorial viewer that segments the input video into chunks based on intervals of related code edits. Check out this two-minute video (meta!) to see it in action:

Here's a screenshot of our prototype:

  1. The UI first uses the extracted code edits to split the original video into intervals based on groups of closely-related edits. Each mini-video represents the times when the instructor was coding up one specific component of their demo. The corresponding portion of the speech transcript (YouTube can automatically generate these) appears above each mini-video.
  2. As each mini-video plays, the code that's being written in that video gets displayed in a text editor alongside that video. The user can copy-paste the code to play with it themselves.
  3. If there are multiple code editor panes present in a given video, the user can choose which one gets displayed in the UI.
  4. The user can search for code (or any text) within the video, and string matches get highlighted in each mini-video.

Learner-generated interface designs

While we think our prototype interface is neat (after all, we designed it!), we also acknowledge that we're not the target audience. Learners are the true target audience for the screencast video viewing improvements we'd like to make. Thus, we wanted to get them to help us come up with new interface ideas.

To do so, we ran four participatory design workshop sessions with three computer science students in each one. We showed them what Codemotion was capable of so that we could get them to brainstorm (both individually and in a group) what kinds of new interfaces they'd like to see us develop.

The students collectively came up with 28 distinct interface ideas; 21 of those ideas were generated by more than one student, and 17 were independently generated by those from different sessions. Here's a preview of some of those ideas:

  • Overlay an editable version of the code directly on top of the original video. This will let viewers copy-paste, annotate with their own notes, file bug reports at exact points if there are typos or other errors, and maybe even run the code.
  • Instead of a video-centric interface, make a code-centric interface where all the extracted code is shown in an IDE. When the user hovers over specific parts of the code, that will pop up the corresponding part of the video where the instructor was writing and explaining that code.
  • When the user searches for a particular code construct, stitch together a bunch of related videos that feature the queried code so that the user can see multiple independent demonstrations of experts working with that code.
  • Generate active learning activities based on code that's being written within the video. For instance, before showing what the instructor actually codes up, give the viewer a quiz (multiple choice or fill-in-the-blank) to see if they can guess correctly before watching to see what really happens.

Parting thoughts

In some imaginary ideal future, everyone would record computer programming tutorials using specialized monitoring tools (like Torta!) that provide detailed metadata about the constituent source code, edit histories, outputs, and provenance so that these tutorials are not simply raw pixels stuck within video files. But in our current reality, screencast videos are still one of the most convenient and pervasive ways to record computer-based tutorials, so zillions of such videos now exist on sites such as YouTube and MOOCs. This paper's contribution works toward helping learners unlock the insights hidden within their pixels.


Read the full paper for details:

Codemotion: Expanding the Design Space of Learner Interactions with Computer Programming Tutorial Videos. Kandarp Khandwala and Philip J. Guo. ACM Conference on Learning at Scale, 2018.
Love them or hate them, videos are a pervasive format for delivering online education at scale. They are especially popular for computer programming tutorials since videos convey expert narration alongside the dynamic effects of editing and running code. However, these screencast videos simply consist of raw pixels, so there is no way to interact with the code embedded inside of them. To expand the design space of learner interactions with programming videos, we developed Codemotion, a computer vision algorithm that automatically extracts source code and dynamic edits from existing videos. Codemotion segments a video into regions that likely contain code, performs OCR on those segments, recognizes source code, and merges together related code edits into contiguous intervals. We used Codemotion to build a novel video player and then elicited interaction design ideas from potential users by running an elicitation study with 10 students followed by four participatory design workshops with 12 additional students. Participants collectively generated ideas for 28 kinds of interactions such as inline code editing, code-based skimming, pop-up video search, and in-video coding exercises.
@inproceedings{KhandwalaLAS2018,
 author = {Khandwala, Kandarp and Guo, Philip J.},
 title = {Codemotion: Expanding the Design Space of Learner Interactions with Computer Programming Tutorial Videos},
 booktitle = {Proceedings of the Fifth Annual ACM Conference on Learning at Scale},
 series = {L@S '18},
 year = {2018},
 isbn = {978-1-4503-5886-6},
 location = {London, United Kingdom},
 pages = {57:1--57:10},
 articleno = {57},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/3231644.3231652},
 doi = {10.1145/3231644.3231652},
 acmid = {3231652},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {computer programming, screencasts, tutorial videos},
}
Related pages tagged as human-computer interaction:
Related pages tagged as programming: