Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Ten years and nearly ten million users: my experience being a solo maintainer of open-source software in academia

Summary
I've maintained Python Tutor (pythontutor.com) for ten years so far and served nearly ten million users, largely by not following well-known best practices for open-source software maintenance.

Ten years ago, in December 2009, I started working on Python Tutor as a side project in grad school. It's a web-based tool that allows users to write code in their browser, see it visualized step-by-step, and get live help from volunteers. Fast-forward to today ... this tool has gotten nearly ten million total users so far from over 180 countries. Every day, over 10,000 people use it to run and visualize over 100,000 pieces of code in Python, C, C++, Java, JavaScript, TypeScript, and Ruby. And tens of thousands of people around the world have used its live chat mode to get real-time help from total strangers on the internet!

For the full origin story, I highly recommend checking out Python Tutor: The First Three Years. Most of the heavy development work happened during those first three formative years (2010–2013), mostly when I was a student with more free time. Since 2013 I've been pursuing a tenure-track assistant professor career path where I'm mainly evaluated on research papers (plus grants, teaching, and other academic work). Thus, I've had very little time for Python Tutor, since working on it won't directly get me tenure or other job promotions. The two biggest new features I've added since 2013 were live chat mode and support for six non-Python languages (see history.txt). But for the most part, I've just been trying to maintain it and handle its year-to-year user growth.

This article is about how I've managed to juggle Python Tutor with my day job as a professor. What's unique about my situation is that I'm the only one maintaining this project – I don't have some magical software development team or secret cabal of students working on it; it's just me. To my knowledge, Python Tutor is the most widely-used piece of open-source software that's maintained by one single assistant professor who is concurrently working toward earning tenure by publishing research papers. So why do I spend time on it when my day job of teaching hundreds of students and managing a 4-to-8-member research lab already consumes all my time? Three main reasons*:

  • Python Tutor's large international user base provides a unique platform for my research. This work has collectively led to almost a dozen research papers and several major grants so far.
  • Most professors stopped coding long ago and turned into full-time research managers and teachers; continuing to write code (however little and crappy!) keeps my technical chops sharp and attuned to the challenges of modern software development, web design, and DevOps workflows. These insights have inspired new research ideas that are unrelated to Python Tutor, which have led to several more papers (e.g., Porta).
  • Pride. I started this project so I want to see it through for as long as I can!

(*One might assume that another reason I work on it is because I use it in my own teaching. However, since I don't teach intro. programming, I've never used it in my classes, except for a little bit to show JavaScript basics in a web programming class!)

Here's how I've been able to keep this project going all these years: by not following well-known best practices for open-source development. In short, the only way I've been able to keep Python Tutor alive is by being a bad software developer and open-source maintainer! Here we go ...

I hyperfocus on one single use case

Python Tutor does only one thing*:

Python Tutor is designed to imitate what an instructor in an introductory programming class draws on the blackboard. It's meant to illustrate small pieces of self-contained code that runs for not too many steps. After all, an instructor can't write hundreds of lines of code, draw hundreds of data structures and pointers, or walk through hundreds of execution steps on the board! Also, code in introductory classes usually doesn't access external libraries. If your code can't fit on a blackboard or a single presentation slide, it's probably too long to visualize effectively in Python Tutor.

Hyperfocus on this single use case gives me great clarity in deciding what code to write, given the very limited time I have for this project. It also guards against feature creep and FOMO.

*It also has a live help queue and chat environment so people can tutor one another, but that supports the single use case above.

I (mostly*) don't listen to user requests

Yes, I know that's ironic since I preach user-centered design in my day job when teaching UX and HCI classes. But if I had listened to the deafening noise of incoming user requests over the years, Python Tutor would've grown into bloatware that's impossible for a single person to maintain. Most notably, users want:

  • to create persistent accounts and profiles (nope! see below on why a stateless design is wonderful)
  • social features to incentivize help requesters and volunteers, like user reviews, reputation points, or gamification (nope!)
  • programmer assistance features found in IDEs (nope!)
  • integration with GitHub, Dropbox, Google Drive, etc. (nope!)
  • to turn Python Tutor into a LMS (Learning Management System) so instructors can create custom programming lessons and track student progress (nope! nope! nope!)
  • for me to customize some aspect of Python Tutor for their own class's needs (nope! in the early days I built custom versions for a few colleagues, but that clearly doesn't scale!)

None of these ideas have anything to do with what makes Python Tutor unique (see hyperfocus), so they'd be huge distractions for me to implement, test, and support as a solo maintainer.

*I actually write down user requests in my wishlist or unsupported features docs. But I don't implement most of them. That said, I always prioritize bug reports that show Python Tutor displaying something wrong or reports of security issues.

I (mostly*) refuse to even talk to users

For the first few years I linked to my email address whenever users triggered an error on the site. That way, they could easily email me bug reports. That was really useful for a while, but after the tool stabilized and most major bugs got fixed, I began receiving fewer useful emails and more annoying ones. 99.99% of users are great, but with millions of users I inevitably run into some who make entitled demands or are otherwise abusive. All it took was a few bad experiences for me to totally shut down here.

Now whenever users trigger an error on the site, I display a link to my unsupported features doc, which includes sternly-worded lines like: If you don't get a reply from me, assume your issue will NOT be addressed. Please don't email me multiple times.

In practice, I rarely reply to emails or GitHub issues, even if the sender sounds sincere. I've found that once you start being nice, some people will keep asking for more and then get angry when you don't keep volunteering your time. Silence is the only scalable solution. I know that not making myself available feels unwelcoming and not inclusive; but I'm paid in my day job to serve students at my own university, not strangers on the internet who are using free software that I maintain in my spare time.

*Of course, I've found great value in talking to users whom I personally know or who were referred to me by colleagues.

I don't do any marketing or community outreach

I don't actively try to get users for Python Tutor; it was a fluke that usage skyrocketed during the MOOC and online learning explosion of the early 2010s, and I've been riding that organic growth since then. Thus, I don't spend any time doing publicity or community outreach work via social media, tech blogs, IRC, ICQ (what?), Slack, Discord, discussion forums, email newsletters, mailing lists, giving tech industry or open-source talks, or on any other medium. Heck, I didn't even pick a good brand name ... most people don't know that this tool isn't just for Python :)

I keep everything stateless

Python Tutor is a stateless web app, which means that the server doesn't remember anything about users: no accounts, no profiles, no saved code, no progress tracking, no nothing! Being stateless makes my coding, deployment, and server maintenance stories all much simpler, which is critical since I'm terrible at sysadmin stuff! I have no databases to maintain, no spam to filter, and no worries about many kinds of security vulnerabilities (e.g., those involving SQL injection or stealing user data). Being stateless also makes it easy to find cheap web hosting, to horizontally scale by replicating on more servers, and to simply reboot or re-image my server whenever something goes wrong, all without worrying about backing up or migrating user data. For instance, I've had a nasty undiagnosed memory leak for years now (maybe due to how I'm misconfiguring Docker), which I sweep under the rug by just rebooting my servers several times per day with a cron job :0

I don't worry about performance or reliability

Python Tutor is fast enough for common use cases. For boring technical reasons, the non-Python visualizers run slower and less reliably (especially C, C++, and Java). Of course faster would be better, but I've got no time to optimize! Also, it's reliable enough for the most part, but I have a disclaimer on the site that says This service is free with no technical support and no quality guarantees of any kind. If people don't like it, they can have their money back!

I use super old and stable technologies

Since I'm in this for the long haul, I stick with technologies that are time-tested and not likely to go away soon. My tech stack doesn't look at all impressive on a resume, so I'd rather not embarrass myself by revealing it :) I've seen several waves of fads come and go in terms of web development stacks over the past ten years that Python Tutor has been around. (Funny enough, sticking with tried-and-true old tech actually helps with performance and reliability, even though I claim not to care much about them!)

I don't make it easy for others to use my code

My code is open source on GitHub, but I don't spend any effort making it easy for others to use it. If you want to figure it out yourself, have fun! Installation and configuration instructions are nightmarishly hard to write, maintain, and update, especially for complex web apps like Python Tutor that have lots of moving parts on both the front-end and back-end. Everyone who wants to install it has a subtly different setup on their server, which is impossible for me to account for (“why doesn't step 5 work for me on Ubuntu 18.04a.rc2 with gcc 9.38.427.alpha-beta and Webpack 0.28.392.823.ahhhh?!?”). That's why one of the bullet points in my unsupported features doc is:

I can't provide technical support for users who want to install Python Tutor on their own computers/servers.

Yes I know that it would be ideal if I could provide free tech support for everyone who wants to set up their own in-house custom version of Python Tutor, but no amount of money in the world could make me want to go through that pain.

Finally, I don't let other people contribute code

My unsupported features doc also states:

I don't have time to review any outside code contributions or GitHub pull requests. Feel free to fork the code.

Not allowing external contributors instantly eliminates the mountains of work required to coordinate, motivate, and incentivize others to contribute, and to ensure that their code is up to the quality standards that I expect. It also eliminates potential conflicts around governance models, software licenses, patents, people's employers claiming ownership, etc.

Not having collaborators also means that my code can be super-messy and not well-documented; I only document it well enough for future-me to make sense of it, not for anyone else.

In the past I've actually tried to incorporate contributions from students or remote collaborators, but for the most part that never worked out since I didn't have the mental energy to make sure it was worthwhile for everyone. Again, this isn't my day job. Also, I've found that the best students don't want to work on a professor's crufty ten-year-old codebase! Not very motivating. They'd rather make their own new software, which I encourage! (Note that some students have built research experiments atop Python Tutor and published papers on them, but they haven't contributed to the actual production codebase.)

That said, several people in the past have contributed useful patches that I've adapted and incorporated into the codebase (see acknowledgments), but those are rare. Most significantly, David Pritchard and Will Gwozdz wrote the entire Java visualizer.

Some might argue that an N=1 solo maintainer project like mine isn't truly “open source” since there's no community of code contributors. But whateves.

Uninspirational Parting Thoughts

I think it was a total fluke that I've been able to keep Python Tutor going as a solo maintainer all these years even though my day job doesn't directly incentivize it. Sometimes grad students ask me how they can turn their Ph.D. research into widely-used open-source software (even though Python Tutor wasn't part of my Ph.D. research ... it was a side project). I think they're expecting some inspirational words, but my blunt response is that this is an unrealistic goal. If it happens to work out, then go with it ... but don't expect it. Doing good research is hard enough as-is; trying to simultaneously build something that lots of people can use is doubly-hard and will cause way too much stress.

Appendix: on open-source software maintenance

Here are some good readings, videos, and podcasts about open-source software maintenance.

Start here:

Maintainer experiences:

Perspectives from academia:

Podcasts:

Subscribe to email newsletter
Donate to help pay for web hosting costs

Created: 2019-11-16
Last modified: 2019-11-16
Related pages tagged as Python Tutor:
Related pages tagged as software: