June 2012 (perspective of a Ph.D. student)
Year Five: Production
The Ph.D. Grind
At the beginning of my fifth year (September 2010), I still had nothing to include in my (nonexistent) dissertation. By now, most of my classmates had published at least one dissertation-worthy first-author conference paper. Since I didn't have any dissertation-worthy papers yet (the IncPy paper was still under review), I was afraid that it would take me seven or eight total years to graduate.
Within the next twelve months, though, I would publish four conference papers and one workshop paper (all as the first author), thereby paving a clear path for my graduation. Without a doubt, my fifth year was my most productive of grad school. I was relentlessly focused.
In the middle of summer 2010, progress on IncPy was steady, and I was on track to submitting a paper by the September deadline. However, I knew that IncPy would not be substantive enough for an entire dissertation. So besides working towards the upcoming paper deadline, I also spent some time thinking about my next project idea.
I wish I could say that my solo brainstorming sessions were motivated by a true love for the pure essence of academic scholarship. But the truth was that I was driven by straight-up fear: I was afraid of not being able to graduate within a reasonable time frame, so I pressured myself to come up with new ideas that could potentially lead to publications. I was all too aware that it might take two to three years for a paper to get accepted for publication, so if I wanted to graduate by the end of my sixth year, I would need to submit several papers this year and pray that at least two get accepted. I felt rushed because my fellowship lasted only until the end of this year. After my funding expired, I would need to either find grant funding from a professor and face all of the requisite restrictions (e.g., working on Klee again), or else become a perpetual teaching assistant and delay my graduation even further. Time was running out.
On July 29, 2010, almost exactly one year after I conceived the initial idea for IncPy, I came up with a related idea, again inspired by real problems that computational researchers encounter while performing data analysis. I observed that because researchers write computer programs in an ad-hoc “sloppy” style, their programs often crash for silly reasons without producing any analysis results, thus leading to table-pounding frustration. My insight was that by altering the run-time environment (interpreter) of the Python programming language, I could eliminate all of those crashes and allow their sloppy programs to produce partial results rather than no results. I named this proposed modification to the Python interpreter “SlopPy,” which stands for Sloppy Python.
Although SlopPy and IncPy are very different ideas, I implemented both by altering the behavior of the Python interpreter. The nearly one thousand hours I had spent over the past year hacking (modifying) the Python interpreter for IncPy gave me confidence that I could implement SlopPy fairly easily. It took me only two months to create a basic working prototype, run some preliminary experiments, and submit a paper to a second-tier conference. I aimed for that conference both because its deadline was conveniently timed and also because I didn't think that SlopPy was a “big” enough idea to get accepted in a top-tier conference.
By October 2010, I had two papers under submission. At this point, I had given up all hope of getting a job as a professor, since I still had not published a single paper for my dissertation; competitive computer science faculty job candidates have all already published a few acclaimed first-author papers by this time in their Ph.D. careers. Unless a miracle struck, I would not be able to get a respectable research university job, so I aimed to do only enough work to graduate without worrying about how my resume would appear.
I received the opposite of a miracle: Both my IncPy and SlopPy paper submissions were rejected. I was disappointed but not too shocked, since I had already grown accustomed to paper rejections by this time. There were lots of legitimate criticisms of my work, so I felt that addressing them would strengthen my future resubmissions.
Most notably, I had unwisely framed the pitch for IncPy in a way that led to my paper being reviewed by researchers in a subfield that wasn't as “friendly” to my research philosophy. In theory, technical papers should be judged on their merit alone, but in reality, reviewers each have their own unique subjective tastes and philosophical biases. So I drastically rewrote my introductory pitch with the aim of getting more amicable reviewers and then resubmitted to a second-tier conference to further improve its chances of acceptance. My plan worked, and the IncPy conference paper was accepted—albeit with lukewarm reviews—on my second submission attempt in early 2011.
I later revised and resubmitted my SlopPy paper to a workshop that was being held together with the conference where I would be presenting IncPy. This strategy worked well since it was far easier to get a paper accepted in a workshop than in a conference. Also, Dawson wouldn't need to pay much extra for me to attend the workshop since I was already going to the conference to present IncPy. As expected, the SlopPy workshop paper was accepted, and although it didn't “count” as a standalone dissertation contribution, at least it was better than no publication; I hoped to incorporate this paper as a smaller chapter in my dissertation to supplement the more substantive chapters, which would all derive from conference papers.
Back in October 2010, right after submitting the IncPy and SlopPy papers, I asked Dawson what it would take for me to graduate in the next few years. Predictably, he replied that I needed publications as proof of the legitimacy of my work. He did have one concrete suggestion, though: Another dissertation-worthy project would be for me to combine my interest in Python with Klee-like ideas that he loved in order to create an automated bug-finding tool for Python programs. Since I wasn't keen on returning to Klee in any form, I discarded his suggestion and continued thinking about possible extensions to IncPy and SlopPy that could make for a follow-up paper submission.
By this time, a nascent dissertation theme was beginning to form in my head: Both IncPy and SlopPy were software tools to improve the productivity of computational researchers. Thus, to think of my next project idea, I returned to identifying problems computational researchers faced in their work and then designing new tools to address those problems.
Specifically, I noticed that researchers edit and execute their Python programs dozens to hundreds of times per day while running computational experiments; they repeat this process for weeks or months at a time before making a significant discovery. I had a hunch that recording and comparing what changed between program executions could be useful for debugging problems and obtaining insights. To facilitate such comparisons, I planned to extend IncPy to record details about which pieces of code and data were accessed each time a Python program executes, thereby maintaining a rich history of a computational experiment. I also thought it would be cool for researchers to share these experiment histories with colleagues so that they can learn from what worked and didn't work during experimental trials.
My gut told me that some ideas along these lines could be innovative and publishable, but I couldn't quite form a crisp research pitch yet; my thoughts were all fuzzy and squishy. I felt stuck, so I sought another meeting with Fernando, whom I first met during my fourth year when I introduced IncPy and gave a talk at his UC Berkeley lab group meeting. Fernando fit me into his schedule, and our one-hour meeting planted the seeds for my next project idea.
As soon as I mentioned my ideas about extending IncPy to record Python-based experiment histories, Fernando launched into a passionate sermon about a topic that I had never heard of but was fascinated by: reproducible research.
One of the cornerstones of experimental science is that colleagues should be able to reproduce anyone's research findings to verify, compare against, and build upon them. In the past decade, more and more scientists in diverse fields have been writing computer programs to analyze data and make scientific discoveries. Thousands of scientific papers published each year are filled with quantitative findings backed by numbers, graphs, and tables. However, the unspoken shame in modern science is that it's nearly impossible to reproduce or verify any of those findings, since the original computer code and data sets used to produce those findings are rarely available. As a result, lots of papers containing egregious errors—due to both honest mistakes and outright fraud—have gone unchallenged, sometimes resulting in scientific claims that have led to human deaths. In recent years, reform-minded scientists such as Fernando have been trying to raise awareness for the importance of reproducible research in the computational sciences.
Why is reproducibility so hard to achieve in practice? A few ultra-competitive scientists purposely hide their computer code and data to fend off potential rivals, but the majority are willing to share code and data upon request. The main technical barrier, though, is that simply obtaining someone's code and data isn't enough to rerun and reproduce their experiments. Everyone's code needs a highly-specific environment in which to run, and the environments on any two computers—even those with the same operating system—differ in subtle and incompatible ways. Thus, if you send your code and data to colleagues, they probably won't be able to rerun your experiments.
Fernando liked my IncPy experiment history recording idea because it could also record information about the software environment where an experiment originally occurred. Then researchers who use Python can send their code, data, and environment to colleagues who want to reproduce their experiments. I came out of that meeting feeling pumped that I had found a concrete application for my idea. The reproducible research motivation seemed compelling enough to form the storyline for a second independent IncPy-based paper submission and dissertation contribution.
As I was jotting down more detailed notes, a moment of extreme clarity struck: Why limit this experiment recording to only Python programs? With some generalizations to my idea, I could make a tool that enables the easy reproduction of computational experiments written in any programming language! Still in a mad frenzy, I sketched out the design for a new tool named “CDE,” which stands for Code, Data, and Environment.
When I told my idea to Dawson, he responded favorably but challenged me to think even bigger: Why limit CDE to targeting only scientists' code? Why not make it a general-purpose packaging tool for all kinds of software programs? Those were wise words. A variety of software creators and distributors—not only scientists—have trouble getting other people to run their programs due to the same environment incompatibility problem, which is affectionately known as “dependency hell.” Dependency hell is especially widespread on Linux-based operating systems due to the multitude of semi-incompatible Linux variants that people use; programs that run on one person's Linux computer are unlikely to run on someone else's slightly different Linux computer. With some adjustments to my original idea, CDE could enable anybody to package up their Linux programs so that others can run them without worrying about these environment mismatches. I felt thrilled that CDE could potentially alleviate the decades-old problem of dependency hell on Linux.
As my usual reality check, I scoured the Web for related work, looking for both research prototypes and production-quality tools with similar functionality. To my relief, there wasn't much prior work, and CDE stood out from the sparse competition in two important ways: First, I designed CDE to be much easier to use than similar tools. As a user, you create a self-contained code, data, and environment package by simply running the program that you want to package. Thus, if you can run a set of programs on your Linux computer, then CDE enables others to rerun those same programs on their Linux computers without any environment installation or configuration. Second, the technical mechanism that CDE employs—a technique called system call redirection—enables it to be more reliable than related tools in a variety of complex, real-world use cases.
At this point, CDE existed only as a collection of notes and design sketches, but I sensed its potential when I realized that it was conceptually simpler, easier to use, and more reliable than all other tools in existence. A part of me was shocked and paranoid: Why hasn't anybody else implemented this before?!? This idea seems so obvious in retrospect! One possible reason I dreaded was that nobody had previously built something like CDE because it was impossible to get the details right to make it work effectively in practice. Maybe it was one of those ideas that looked good on paper but wasn't practically feasible. I figured that there was no better way to find out than to try implementing CDE myself.
Over three intense weeks spanning October and November 2010, I super-grinded on creating the first version of CDE. As I suspected, although the research idea behind CDE was straightforward, there were many grimy programming-related contortions required to get CDE working on real Linux programs. I lived and breathed CDE for those weeks, forgetting everything else in my life. I programmed day and night, often dreaming in my sleep about the intricate details that my code had to wrestle with. Every morning, I would wake up and jump straight to programming, feeling scared that this would finally be the day when I hit an insurmountable obstacle proving that it was, in fact, impossible to get CDE to work. But the days kept passing, and I kept getting closer to my first milestone: demonstrating how CDE allows me to transfer a sophisticated scientific program between two Linux computers and reproduce an experiment without hassle.
I felt ecstatic when, after three weeks of coffee-fueled fully-immersed grinding, I finally got CDE working on my scientific program demo. At that point, I knew that CDE had the potential to work on many kinds of real-world Linux programs if I kept testing and improving its code. I made a ten-minute video demo introducing CDE, created a project website containing the video and a downloadable copy of CDE, and then emailed the website link to some friends. Unbeknownst to me, one of my friends posted the following blurb about CDE on Slashdot, a popular online computer geek forum:
A Stanford researcher, Philip Guo, has developed a tool called CDE to automatically package up a Linux program and all its dependencies (including system-level libraries, fonts, etc!) so that it can be run out of the box on another Linux machine without a lot of complicated work setting up libraries and program versions or dealing with dependency version hell. He's got binaries, source code, and a screencast up. Looks to be really useful for large cluster/cloud deployments as well as program sharing.
Within 24 hours, the Slashdot forum thread had hundreds of messages, and I began receiving dozens of emails from Linux enthusiasts around the world who downloaded and tried CDE, including gems such as: “i just wanted to tell u that U ROCK! i'm really impressed to see this idea working. I will promote the usage of it on my linux community hear [sic] in Tijuana, Mexico.” These unfiltered, off-the-cuff compliments from actual users meant more to me than any fellow researcher praising my previous ideas or papers.
From a research standpoint, my mission was now accomplished: I successfully built an initial prototype of CDE and demonstrated that it worked on a realistic example use case. The common wisdom in most applied engineering fields is that research prototypes such as CDE only serve to demonstrate the feasibility of novel ideas. The job of a researcher is to create prototypes, experimentally evaluate their effectiveness, write papers, and then move on to the next idea. As a researcher, it's foolish to expect people to use your prototypes as though they were real products; if your ideas are good, then professional engineers might adapt them into their company's future products. At best, a few other research groups might use your prototypes as the basis for building their own prototypes and then write papers citing yours (e.g., over a dozen other university research groups have extended the Klee tool and written papers about their improvements). But it's almost unheard of for non-researchers to use research prototypes in their daily work. In sum, the purpose of academic research is to produce validated ideas, not polished products.
Thus, the wise course of action at the time would have been to submit a paper on CDE and then move on to generating a new idea, implementing a new prototype, submitting a new paper, and repeating until I had enough content to fill up a dissertation. I did submit and publish two conference papers on CDE (a short introductory paper and a longer follow-up paper). But rather than moving on to a new project idea like a prudent researcher would do, I dedicated most of my fifth year to turning CDE into a production-quality piece of software.
I had an urge to make CDE useful for as many people as possible. I didn't want it to languish as yet another shoddy research prototype that barely worked well enough to publish papers. I knew my efforts to polish up CDE wouldn't be rewarded by the research community and might even delay my graduation since I could've spent that time developing new dissertation project ideas. But I didn't care. Since I was still funded by fellowships for the rest of the year, I had full freedom to spend my time as a pro bono software maintainer rather than as a traditional researcher tied to grant funding.
Back in my fourth year, I desperately wanted people to use IncPy, so that's why I felt thrilled to get three measly users. Even though almost nobody ended up using IncPy, my irrational desire to make it into a real-world tool led me to reach out to Fernando at UC Berkeley, and it was Fernando who inspired me to create CDE. Now at the beginning of my fifth year in November 2010—within a few days of having my video demo appear on the popular Slashdot website—CDE already had dozens of users and the potential for a lot more. Judging from early email feedback, I realized that I had created something that people wanted to use in a variety of settings I hadn't originally predicted. In short, CDE struck a chord with all sorts of Linux users who were tired of suffering from dependency hell.
I spent the majority of my fifth year fixing hundreds of bugs to make CDE work on a dizzying array of complex Linux programs; polishing up the documentation, user manual, and FAQ to make it easier to use; exchanging emails and even a few phone calls with users from around the world; and giving numerous talks and sending “marketing” emails to attract new users.
At present (summer 2012), CDE has been downloaded and used by over 10,000 people. I've received hundreds of emails from users with feedback, new feature requests, bug reports, and cool anecdotes. Although this isn't a large number of users for a commercial software product, it's extremely large for a free and open-source research tool being maintained by a single grad student.
Here are some kinds of people who have emailed me thank-you notes and anecdotes about how they used CDE to eliminate Linux dependency hell in their daily work:
Those few months were by far the most enjoyable period of my Ph.D. years, even though I knew that none of my software maintenance activities would contribute towards my dissertation. After the initial success of CDE, I no longer cared if my graduation was delayed by a year or more due to lack of additional publications; I got so much satisfaction from knowing that a piece of software I had invented could improve many people's computing experiences.
CDE also enabled me to achieve one of my long-time nerd dreams: to give a Tech Talk at Google that was broadcast online on YouTube. Since the beginning of grad school, I loved watching Google Tech Talks online on a wide range of academic subjects. I dreamed of the day when I could give such a talk, but I didn't get my hopes up since it seemed like Google employees invited only famous professors and engineers—not unknown grad students—to give these talks.
One day while scouring the Web for projects related to CDE, I serendipitously noticed that my former summer 2007 Google internship manager had recently published a paper in a reproducible research workshop. I emailed him to advertise CDE as a tool for facilitating reproducible research and to ask whether his colleagues might be interested in trying it. To my pleasant surprise, he responded with a talk invitation ending in a winking smiley face: “I took a look at your tool. Looks interesting enough! Would you be interested in giving a Tech Talk about it here at Google? I would certainly help organizing and advertising. You never know how many attendees you get, could be 100, could be none ;-)”
I spent more time preparing for my Google Tech Talk than for any previous talk, since I knew that it would be recorded. My talk went quite well, and afterwards a Google engineering manager (whom I had never met) pulled me aside to ask more detailed follow-up questions. It turned out that he was interested in alleviating this Linux dependency hell problem within the company, so that's why he loved my talk. He offered me an internship where I could spend the upcoming summer working on CDE at Google.
I was flattered by his offer and took some time to deliberate. Professors in my department usually discourage late-stage Ph.D. students from doing internships, since they want students to focus on finishing their dissertations. Also, at the time of my offer, I hadn't yet published any first-author papers for my dissertation (several papers were under review), so I was afraid that leaving Stanford for the summer might give Dawson the impression that I wasn't serious about trying to publish and graduate. However, my gut intuition was that this was a unique and well-timed opportunity that I couldn't turn down: I would be paid a great salary to spend my summer continuing to work on my own open-source software project. In contrast, almost all interns—including myself back in 2007—were expected to work on internal company projects assigned by their managers. I talked to Dawson about my conflicting feelings, and he was quite supportive, so I accepted the internship offer.
I spent a super-chill summer of 2011 at Google dedicating almost all of my workdays to improving CDE, getting new users, and finding creative uses within Google. For part of the summer, I worked closely with another Google engineer who found CDE useful for his own work, which was a great impetus for me to fix additional bugs and to improve the documentation. By this point, I was no longer developing new CDE-related research ideas: I was just sweating the details to continue making it work better. I finally stopped working full-time on CDE after my summer internship ended and my sixth year of Ph.D. began.
Out of the five projects that comprised my dissertation, CDE was my favorite since it was a simple, elegant idea that turned into a practical tool with over 10,000 users. It was by far the least sophisticated from a research standpoint, but it was the most satisfying to work on due to its real-world relevance.
Back at the beginning of my fifth year—long before the IncPy, SlopPy, and CDE papers had been published—I hatched a backup plan in case my own projects failed. I cold-emailed Jeff, a new assistant professor in my department who shared some of my research interests, to ask whether he was open to collaborating on a project that might contribute to my dissertation. The two key “selling points” I mentioned were that I had my own funding and that I wanted to aim for a top-tier paper submission deadline for a conference that he liked. In exchange, he needed to serve on my thesis committee.
As expected, Jeff took me up on my offer. It was a great deal for him since our motivations were well-aligned: I was a senior student who needed to publish to graduate, and he was an assistant professor who needed to publish to earn tenure. Even better, he didn't need to fund me from his grants. And best of all, I was open to working on whatever project he wanted, since my primary solo projects already gave me the independence that I craved.
I was hedging my bets with this plan: If my IncPy, SlopPy, and CDE projects couldn't get published, then at least I would still have a “legitimate” professor-sanctioned project with Jeff, who was now one of my thesis committee members. Jeff and I decided that the best strategy was for me to build upon an interactive data reformatting tool called Wrangler that one of his other students created last year.
Towards the end of my fifth year, I took a break from CDE and spent 2.5 months creating some new extensions to Wrangler. My enhanced version was called “ProWrangler,” which stands for Proactive Wrangler. After implementing the ProWrangler prototype and evaluating its efficacy with controlled user testing on fellow students, I wrote up a paper submission to a top-tier HCI conference with the help of Jeff and the other creators of the original Wrangler tool.
In the midst of my summer 2011 Google internship, I received the happy news that our ProWrangler paper had been accepted with great reviews. By far the biggest contributor to our success was Jeff's amazing job at writing both our paper's introduction and the interpretation of our evaluation results. Our user testing had failed to show the productivity improvement effects that we originally hoped to see, so I was afraid that our paper would be rejected for sure. But miraculously, Jeff's technical writing and argument framing skills turned that near-defeat into a surprise victory. The reviewers loved how we honestly acknowledged the failures of our evaluation and extracted valuable insights from them. Without a doubt, our paper would have never been accepted if not for Jeff's rhetorical expertise. He had a lot of practice, though. Back when he was a Ph.D. student, Jeff published 19 papers mostly in top-tier conferences, which is five to ten times more than typical computer science Ph.D. students. That's the sort of intensity required to get a faculty job at a top-tier university like Stanford.
Throughout my fifth year, I had to carefully split my time between developing new ideas, implementing prototypes, and submitting, revising, and resubmitting papers for four projects—IncPy, SlopPy, CDE, and ProWrangler—whose relevant conference submission deadlines were spread throughout the year. Even though I spent a lot of time nurturing CDE, I had to switch to focusing on other projects whenever deadlines arose. By summer 2011, all four projects were successfully published, usually after several rounds of paper revisions. I felt relieved that my intricate planning had paid off and that a full dissertation now seemed almost within reach.
Copyright © 2012 Philip Guo
Keep this website up and running by making a small donation.