Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Year Two: Inception

The Ph.D. Grind

My summer internship at Google was a welcome break from research. The work was low-stress, and I had fun socializing with fellow interns. By the end of the summer, I had recuperated enough from my previous year's burnout to make a fresh start at being a Ph.D. student again.

At the end of the summer, I wrote an email to Dawson reaffirming my desire to pursue my personal interests while simultaneously acknowledging the need to do legitimate publishable research: “I've realized from this summer and my previous work experiences that it's going to be really hard for me to push ahead with a Ph.D. project unless I feel a strong sense of ownership and enthusiasm about it, so I really want to work to find the intersection of what I feel passionate about and what is actually deemed 'research-worthy' by professors and the greater academic community.”

I planned to continue working with Dawson on the empirical software measurement project that we had discussed at the end of my first year. However, I had a sense that this project might be risky because it was not within his main areas of expertise or interest; Klee was still his top priority. Therefore, I wanted to hedge my bets by looking for another project to concurrently work on and hoping that at least one of them would succeed. I'll describe my main project with Dawson later in this chapter, but first I'll talk about my other project.

~

Right before starting my second year of Ph.D. in September 2007, I took a one-week vacation to Boston to visit college friends. Since I was in the area, I emailed a few MIT professors whom I knew from my undergraduate days to ask for their guidance. When they met with me, they all told me roughly the same thing: Be proactive in talking with professors to find research topics that are mutually interesting, and no matter what, don't just hole up in isolation. This simple piece of advice, repeatedly applied over the next five years, would ultimately lead me to complete my Ph.D. on a happy note.

I immediately took this advice to heart while I was still in Boston. I cold-emailed (sent an unsolicited email to) an MIT computer science professor named Rob to politely request a meeting with him. In this initial email, I briefly introduced myself as a recent MIT alum and current Stanford Ph.D. student who wanted to build tools to improve the productivity of computer programmers. Since I knew that Rob was also interested in this research area, I hoped that he would respond favorably rather than marking my email as spam. Rob graciously met with me for an hour in his office, and I pitched a few project proposals to get his feedback. He seemed to like them, so I grew encouraged that my ideas were at least somewhat acceptable to a professor who worked in this research area. Unfortunately, I wouldn't be able to work with Rob since I was no longer an MIT student. At the end of our meeting, Rob suggested for me to talk to a Stanford computer science professor named Scott to see if I could sell him on my ideas.

When I returned to Stanford, I cold-emailed Scott to set up an appointment to chat. I came into the meeting prepared with notes about three specific ideas and pitched them in the following format:

  1. What's the problem?
  2. What's my proposed solution?
  3. What compelling experiments can I run to demonstrate the effectiveness of my solution?

My friend Greg, one of Rob's Ph.D. students, taught me the importance of the third point—thinking in terms of experiments—when proposing research project ideas. Professors are motivated by having their names appear on published papers, and computer science conference papers usually need strong experiments to get accepted for publication. Thus, it's crucial to think about experiment design at project inception time.

Although none of my specific ideas won Scott over, he still wanted to work with me to develop a project related to my general interests. At the time, he was an assistant (pre-tenure) professor who had been at Stanford for only three years, so he was eager to publish more papers in his quest to earn tenure. Since I had a fellowship, Scott didn't need to fund me from his grants, so there was no real downside for him.

~

Scott specialized in a pragmatic subfield of computer science called HCI (Human-Computer Interaction). In contrast to many other subfields, the HCI methodology for doing research centers on the needs of real people. Here is how HCI projects are typically done:

  1. Observe people to find out what their real problems are.
  2. Design innovative tools to help alleviate those problems.
  3. Experimentally evaluate the tools to see if they actually help people.

Since I wanted to create tools to help improve the productivity of programmers, Scott first suggested for me to observe some programmers at work in their natural habitat to discover what real problems they were facing. In particular, Scott was intrigued by how modern-day programmers write code using an assortment of programming languages and rely heavily on Web search and copying-and-pasting of code snippets. The previous few decades of research in programmer productivity tools have assumed that programmers work exclusively with a single language in a homogeneous environment, which is a grossly outdated assumption. By observing what problems modern-day programmers face, I might be able to design new tools to suit their needs.

Now that Scott had provided a high-level goal, I set out to find some professional programmers whom I could observe at work. First, I tried to drum up some leads at Google, since I had just interned there and my former manager agreed to forward my email solicitation to his colleagues. I quickly received a few polite rejections, since nobody wanted to deal with possible intellectual property issues arising from a non-employee looking at their code. I then emailed a dozen friends at various startup companies near Stanford, figuring that they would not be subject to the same constraints as programmers at a big company. Unfortunately, they were even less accommodating since they had a greater need for secrecy due to competitive reasons. Also, they were far busier than their big-company counterparts and thus unwilling to indulge the intellectual fancy of some random graduate student.

My last-ditch attempt was to try to observe programmers at Mozilla, the nonprofit software development foundation that makes the popular Firefox web browser. Since Mozilla's software projects were all open-source, I figured that they would not be afraid of an outsider coming in to watch their programmers work. Rather than cold-emailing (I didn't even know who to email!), I decided to drive to the Mozilla headquarters and walk in the front door. I accosted the first person I saw and introduced myself. He generously gave me the names and email addresses of two Mozilla Foundation leaders who might be interested in such a research collaboration. I cold-emailed both of them later that day and was amazed that one responded favorably. Unfortunately, that was the last I heard from him; he never responded to my requests to follow up.

In retrospect, I'm not surprised that my workplace shadowing attempts failed, since I had absolutely nothing to offer these professional programmers; I would just be disrupting their workday. Fortunately, a few years later I managed to observe a different set of (nonprofessional) programmers—fellow graduate students doing programming for scientific research—who welcomed my light intrusions and were more than happy to talk to me about their working environments. Those interviews would end up directly inspiring my dissertation work.

~

Since I had no luck in shadowing professional programmers, I decided to look within Stanford for opportunities. When I saw a flyer announcing an annual computer programming competition being held soon in my department, I immediately cold-emailed the organizer. I pitched him on the idea of having me observe students during the competition, and he happily agreed.

Although a student programming competition was not indicative of real-world programming environments, at least it was better than nothing. Joel, one of Scott's students who entered the Ph.D. program one year before me, wanted to come observe as well. Joel and I spent an entire Saturday morning watching a few students participating in the four-hour competition. It was a pretty boring sight, and we didn't end up learning much from the notes we took. However, this experience gave us the idea to plan a more controlled laboratory study.

At this point, Scott decided to have Joel and me team up to create a lab study together rather than working on separate projects. Since Joel and I shared similar research interests, Scott could reduce his management overhead by having us join forces. I was happy to let Joel take the lead on the lab study design, since I was concurrently working on another project with Dawson.

Joel, Scott, and I designed a lab study where we asked Stanford students to spend 2.5 hours programming a simple Web-based chat room application from scratch. They were allowed to use any resources they wanted, most notably searching the Web for existing code and tutorials. We recruited 20 students as participants, most of them coming from the Introduction to HCI course that Scott was teaching at the time.

Over the next few weeks, Joel and I sat in our department's basement computer lab for 50 hours (2.5 hours x 20 students) observing study participants and recording a video feed of the lab's computer monitor. Watching the students at work was engaging at first but quickly grew tedious as we observed the same actions and mistakes over and over again. We then spent almost as much time watching replays of the recorded videos and marking occurrences of critical events. Finally, we analyzed our notes and the marked events to garner insights about what kinds of problems these students faced when working on a simple yet realistic real-world programming task.

We wrote up our lab study findings and submitted our paper to a top-tier HCI conference. Three months later, we found out that our paper was accepted with great reviews and even nominated for a Best Paper Award. Joel was listed as the first author on this paper, and I was the second author. Almost all computer science papers are coauthored by multiple collaborators, and the order in which authors are listed actually matters. The author whose name appears first is the project leader (e.g., Joel) who does more work than all subsequently listed authors and thus deserves most of the credit. All other authors are project assistants—usually younger students (e.g., me) or distant colleagues—who contributed enough to warrant their names being on the paper. Ph.D. students often list their advisor (e.g., Scott) as the last author, since the advisor helps with idea formulation, project planning, and paper writing.

Since I wasn't the first author on this paper, it didn't contribute towards my dissertation; however, this experience taught me a great deal both about how to do research and about how to write research papers. Most importantly, I felt satisfied to see this project conclude successfully with a prestigious top-tier conference publication, in stark contrast to the embarrassing Klee paper rejection from my first year.

Joel continued down this research path by turning our paper into the first contribution of his dissertation. Over the next few years, he built several tools to help programmers overcome some of the problems we identified in that initial study and published papers describing those tools. In the meantime, Scott didn't actively recruit me to become his student, so I never thought seriously about switching advisors. It seemed like my interests were too similar to Joel's, and Joel was already carving out a solid niche for himself in his subfield. Thus, I focused my efforts on my main project with Dawson, since he was still my advisor. Progress was not as smooth on that front, though.

~

Recall that at the end of my first year, I started exploring research ideas in a subfield called empirical software measurement—specifically, trying to measure software quality by analyzing the development history of software projects. It turned out that Dawson was also interested in this topic, so we continued working together on it throughout my second year. His main interest was in building new automated bug-finding tools (e.g., Klee), but he had a side interest in software quality in general. He was motivated by research questions such as:

  • If a large software project has, say, 10 million lines of code, which portions of that code are crucial to the project, and which are not as important?

  • What factors affect whether a section of code is more likely to contain bugs? For example, does code that has been recently modified by novices contain more bugs? What about code that has been modified by many people over a short span of time?

  • If an automated bug-finding tool finds, say, 1,000 possible bugs, which ones are likely to be important? Programmers have neither the time nor energy to triage all 1,000 bug reports, so they must prioritize accordingly.

I investigated these kinds of questions by analyzing data sets related to the Linux kernel software project. I chose to study Linux since it was the largest and most influential open-source software project at the time, with tens of thousands of volunteer programmers contributing tens of millions of lines of code over the span of two decades. The full revision control history of Linux was available online, so that became my primary source of data. A project's revision control history contains a record of all modifications made to all code files throughout that project's lifetime, including when each modification was made, and more importantly, by whom. To correlate project activity with bugs, I obtained a data set from Dawson's software bug-finding company containing 2,000 bugs that one of its bug-finding tools had found in Linux. Of course, those were not the only bugs in Linux, but it was the only accessible data source with the information I needed to investigate our research questions.

My daily workflow consisted of writing computer programs to extract, clean up, reformat, and analyze the data from the Linux revision control history and those 2,000 bug reports. To help obtain insights, I taught myself the basics of quantitative data analysis, statistics, and data visualization techniques. As I worked, I kept a meticulous log of my experimental progress in a research lab notebook, noting which trials did and did not work. Every week or so, I would meet with Dawson to present my findings. Our meetings usually consisted of me showing him printouts of graphs or data tables that my analyses had generated, followed by him making high-level suggestions such as, “Wow, this part of the graph looks weird, why is that? Split the data up in this way and dig deeper.” Years later, I learned that this working style was fairly common amongst computational researchers in a variety of academic fields; for my dissertation, I created tools to eliminate common inefficiencies in this pervasive type of workflow. However, back at that time, I had no such long-term visions; I just wanted to make interesting discoveries and get them published.

~

Dawson and I had a lot of trouble getting our results published. Throughout the year, we made two paper submissions that were both rejected. It would take another full year before this work finally got published as a shorter-length, second-tier conference paper, which held almost no prestige and didn't “count” as a contribution to my dissertation. But by then, I didn't care because I had already moved on to other projects.

The underlying cause of our publication troubles was that we were not “insiders” in the empirical software measurement subfield (sometimes also called empirical software engineering) to which our project belonged. At the time Dawson and I started working in this subfield, teams of researchers from dozens of universities and corporate research labs were already doing similar work. Dawson and I were severely outgunned by the competition, which consisted of professors and research scientists specializing in empirical software measurement who were advising armies of Ph.D. students to do the heavy number crunching. These people were hungry to publish a ton of papers, since lots of them were young professors who wanted to earn tenure. They were also experts in the statistical methodologies, framing of related work, and “marketing pitches” required to get these sorts of papers accepted. Most importantly, they frequently served on program committees and as external reviewers for the relevant conferences, so they knew exactly what it took to write publishable papers in this subfield.

Recall that each paper submission gets peer-reviewed by three to five volunteer experts—usually professors or research scientists—who critique its merit. If reviewers deem a paper worthy of publication, then it gets published; otherwise, authors must revise and resubmit at a later date. The purpose of peer review is to ensure that all published papers are up to a certain level of acceptable quality, as determined by the scholarly community. This vetting process is absolutely necessary, since there needs to be some arbiters of “goodness” to filter out crackpot claims. However, the peer-review process is inherently imperfect since, despite their best efforts at being impartial, reviewers are human beings with their own subjective tastes and biases.

Since conferences usually accept less than 20 percent of paper submissions, if reviewers get a bad first impression when reading a paper, then they are likely to reject it. Dawson and I were not specialists in the empirical software measurement subfield, so we weren't able to “pitch” our paper submissions in a way that appealed to reviewers' expectations. Thus, we repeatedly got demoralized by negative reviews such as, “In the end, I find I just don't have much confidence in the results the authors present. There are two sources of doubt: I don't trust their interpretation of the measures, and they don't use very effective statistical techniques.” In the cutthroat world of academic publishing, simply being passionate about a topic is nowhere near sufficient for success; one must be well-versed in the preferences of senior colleagues in a particular subfield who are serving as paper reviewers. In short, our data sets were not as good, our techniques were not as refined, and our results and presentation style were less impressive than what the veterans in this subfield expected.

In contrast, my paper submission with Scott and Joel was far more successful because Scott was an insider who had previously published and reviewed many papers in the HCI conference where we submitted our paper. Of course, being an insider didn't mean that our paper was scrutinized any less rigorously, since that would be unfair. However, Scott could leverage his experience to present our project's motivations and findings in a way that was the most palatable to those sorts of reviewers, thereby raising our paper's chances of acceptance.

~

By the end of my second year of Ph.D. (June 2008), I was growing frustrated by my lack of compelling results and overwhelmed by the flurry of papers being published in the empirical software measurement subfield. I still hadn't been able to publish my own findings and realized that I couldn't effectively compete with the veterans: Since neither Dawson nor I knew what those paper reviewers wanted to see, I sensed that trying to publish in this subfield would be a continual uphill struggle. And since publishing was a prerequisite for graduation, I had to find another project as soon as possible, or else I wouldn't be able to earn a Ph.D. As I prepared to begin my third year at Stanford, I was desperate to cling onto anything that had a reasonable chance of producing publishable results. And that's when I returned back to the project that haunted me from my first year—Klee.


Next - Year Three: Relapse

Download PDF version

Copyright © 2012 Philip Guo