Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Year One: Downfall

The Ph.D. Grind

In the summer of 2006, several months prior to starting my Ph.D. at Stanford, I thought about ideas for research topics that I felt motivated to pursue. In general, I wanted to create innovative tools to help people become more productive when doing computer programming (i.e., improving programmer productivity). This area of interest arose from my own programming experiences during summer internships: Since my assigned day-to-day work wasn't mentally stimulating, I spent a lot of time in my cubicle reflecting on the inefficiencies in the computer programming process at the companies where I worked. I thought it would be neat to work on research that helps alleviate some of those inefficiencies. More broadly, I was interested in research that could help other types of computer users—not only professional programmers—become more productive. For example, I wanted to design new tools to assist scientists who are analyzing and graphing data, system administrators who are customizing server configurations, or novices who are learning to use new pieces of software.

Although I had these vague high-level interests back then, I was still many years away from being able to turn them into legitimate publishable research projects that could form a dissertation. To graduate with a Ph.D. from the Stanford Computer Science Department, students are expected to publish two to four related papers as the first (lead) author and then combine those papers together into a book-length technical document called a dissertation. A student is allowed to graduate as soon as a three-professor thesis committee approves their dissertation. Most students in my department take between four to eight years to graduate, depending on how quickly they can publish.

At new student orientation in September 2006, professors in my department encouraged all incoming Ph.D. students to find an advisor as soon as possible, so my classmates and I spent the first few months chatting with professors to try to find a match. The advisor is the most important member of a student's thesis committee and has the final say in approving a student to graduate. In my field, advisors are responsible for providing funding for their students (usually via research grants) and working with them to develop ideas and to write papers. I met with a few professors, and the one whose research interests and style seemed most closely related to mine was Dawson, so I chose him as my advisor.

When I arrived on campus, Dawson was a recently-tenured professor who had been at Stanford for the past eight years; professors usually earn tenure (a lifetime employment guarantee) if they have published enough notable papers in their first seven years on the job. Dawson's main research interest was in building innovative tools that could automatically find bugs (errors in software code) in complex pieces of real-world software. Over the past decade, Dawson and his students built several tools that were able to find far more bugs than any of their competitors. Their research techniques were so effective that they created a successful startup company to sell software bug-finding services based on those techniques. Although I somewhat liked Dawson's projects, what appealed more to me was that his research philosophy matched my own: He was an ardent pragmatist who cared more about achieving compelling results than demonstrating theoretical “interestingness” for the sake of appearing scholarly.

During my first meeting with Dawson, he seemed vaguely interested in my broader goals of making computer usage and programming more productive. However, he made it very clear that he wanted to recruit new students to work on an automatic bug-finding tool called Klee that his grant money was currently funding. (The tool has had several names, but I will call it “Klee” for simplicity.) From talking with other professors and senior Ph.D. students in my department, I realized it was the norm for new students to join an existing grant-funded research project rather than to try creating their own original project right away. I convinced myself that automatically finding software bugs was an indirect way to make programmers more productive, so I decided to join the Klee project.

When I started working on Klee in December 2006, Dawson was supervising five other students who were already working on it. The project leader, Cristi, was a third-year Ph.D. student who, together with Dawson, built the original version of Klee. Dawson, Cristi, and a few other colleagues had recently coauthored and published their first paper describing the basic Klee system and demonstrating its effectiveness at finding new kinds of bugs. That paper was well-received by the academic community, and Dawson wanted to keep up the momentum by publishing a few follow-up papers. Note that it's possible to publish more than one paper on a particular research project (i.e., follow-up papers), as long as each paper contains new ideas, improvements, and results that are different enough from the previous ones. The paper submission deadline for the next relevant top-tier conference was in March 2007, so the Klee team had four months to create enough innovations beyond the original paper to warrant a new submission.

~

Before I continue my story, I want to briefly introduce how academic papers are peer-reviewed and published. In computer science, the most prestigious venues for publishing papers are conferences. Note that in many other academic disciplines, journals are the most prestigious venues, and the word “conference” means something quite different. The computer science conference publication process works roughly as follows:

  1. Each conference issues a call for papers with a list of topics of interest and a specific submission deadline.

  2. Researchers submit their papers by that deadline. Each conference typically receives 100 to 300 paper submissions, and each paper contains the equivalent of 30 to 40 pages of double-spaced text.

  3. The conference program committee (PC), consisting of around 20 expert researchers, splits up the submitted papers and reviews them. Each paper is reviewed by three to five people, who are either PC members or volunteer external reviewers solicited by PC members. The review process takes about three months.

  4. After everyone on the PC is done with their reviews, the PC meets and decides which papers to accept and which to reject based on reviewer preferences.

  5. The PC emails all authors to notify them of whether their papers have been accepted or rejected and attaches the written reviews to the notification emails.

  6. Authors of accepted papers attend the conference to give a 30-minute talk on their paper. All accepted papers are then archived online in a digital library.

A prestigious top-tier conference accepts 8 to 16 percent of submitted papers, and a second-tier conference accepts 20 to 30 percent. Due to these relatively low acceptance rates, it's not uncommon for a paper to be rejected, revised, and resubmitted several times before being accepted for publication—a process that might take several years. (A paper can be submitted to only one conference at a time.)

~

After Dawson made it clear that he wanted to aim for that particular March 2007 top-tier conference submission deadline, he told me what the other five students were currently working on and gave options for tasks that I could attempt. I chose to use Klee to find new bugs in Linux device drivers. A device driver is a piece of software code that allows the operating system to communicate with a hardware peripheral such as a mouse or keyboard. The Linux operating system (similar to Microsoft Windows or Apple Mac OS) contains thousands of device drivers to connect it with many different kinds of hardware peripherals. Bugs in device driver code are both hard to find using traditional means and also potentially dangerous, because they can cause the operating system to freeze up or crash.

Dawson believed that Klee could find new bugs that no automated tool or human being had previously found within the code of thousands of Linux device drivers. I remember thinking that although new Linux device driver bugs could be cool to present in a paper, it wasn't clear to me how these results constituted a real research contribution. From my understanding, I was going to use Klee to find new bugs—an application of existing research—rather than improving Klee in an innovative way. Furthermore, I couldn't see how my project would fit together with the other five students' projects into one coherent paper submission in March. However, I trusted that Dawson had the high-level paper writing strategy in his head. I had just joined the project, so I didn't want to immediately question these sorts of professor-level decisions. I was given a specific task, so I wanted to accomplish it to the best of my abilities.

~

I spent the first four months of my Ph.D. career painstakingly getting Klee to analyze the code of thousands of Linux device drivers in an attempt to find new bugs. Although my task was conceptually straightforward, I was overwhelmed by the sheer amount of grimy details involved in getting Klee to work on device driver code. I would often spend hours setting up the delicate experimental conditions required for Klee to analyze a particular device driver only to watch helplessly as Klee crashed and failed due to bugs in its own code. When I reported bugs in Klee to Cristi, he would try his best to address them, but the sheer complexity of Klee made it hard to diagnose and fix its multitude of bugs. I'm not trying to pick on Klee specifically: Any piece of prototype software developed for research purposes will have lots of unforeseen bugs. My job was to use Klee to find bugs in Linux device driver code, but ironically, all I ended up doing in the first month was finding bugs in Klee itself. (Too bad Klee couldn't automatically find bugs in its own code!) As the days passed, I grew more and more frustrated doing what I felt was pure manual labor—just trying to get Klee to work—without any intellectual content.

This was the first time in my life that I had ever felt hopelessly overwhelmed by a work assignment. In the past, my summer internship projects were always manageable, and although lots of schoolwork in college was challenging, there was always a correct answer waiting to be discovered. If I didn't understand something in class, then teaching assistants and more advanced students would be available to assist. Even when doing research as an undergraduate, I could always ask my mentor (who was then a fourth-year Ph.D. student) to help me, since I worked on relatively simple problems that he usually knew how to solve. The stakes were also lower as an undergraduate research assistant, since research was only a small fraction of my daily schedule. If I was stuck on a research task, then I could instead focus on classwork or hang out with friends. My college graduation didn't depend on excelling in research. However, now that I was a Ph.D. student, research was my only job, and I wouldn't be able to earn a degree unless I succeeded at it. My mood was inextricably tied to how well I was progressing every day, and during those months, progress was painfully slow.

I was now treading in unfamiliar territory, so it was much harder to seek help than during my undergraduate years when answers were clear-cut. Since I was the only person trying to use Klee on device driver code, my colleagues were unable to provide any guidance. Dawson gave high-level strategic advice from time to time, but like all tenured professors, his role was not to be “fighting in the trenches” alongside his students. It was our job to figure out all of the intricate details required to produce results—in my case, to find new bugs in Linux device drivers that nobody had previously found. Professors love to repeat the refrain, “If it's already been done before, then it wouldn't be research!” For the first time, I viscerally felt the meaning of those words.

Despite my daily feelings of hopelessness, I kept on telling myself: I'm just getting started here, so I should be patient. I didn't want to appear weak in front of my advisor or colleagues, especially because I was the youngest student in Dawson's group. So I trudged forward day after day for over 100 consecutive days, fixing Klee-related problems as they arose and then inevitably encountering newer and nastier obstacles in my quest to find those coveted Linux device driver bugs.

During almost every waking moment, I was either working, thinking about work, or agonizing over how I was stuck on obscure technical problems at work. Unlike a regular nine-to-five job (e.g., my summer internships) where I could leave my work at the office and chill every night in front of the television, research was emotionally and mentally all-consuming. I found it almost impossible to shut off my brain and relax in the evenings, which I later discovered was a common ailment afflicting Ph.D. students. Sometimes I even had trouble sleeping due to stressing about how my assigned task was unbelievably daunting. There was no way to even fathom taking a break because there was so much work to do before the paper submission deadline in March.

In the midst of all of this manual labor, I tried to come up with some semi-automated ways to make my daily grind less painful. I discussed a few preliminary ideas with Dawson, but we ultimately concluded that there was no way to avoid such time-consuming grinding if we wanted Klee to find bugs in Linux device drivers. I had to tough it out for a few more months until we submitted the paper.

My rational brain understood that experimental research in science and engineering fields often involves a tremendous amount of unglamorous, grungy labor to produce results. Ph.D. students, especially first- and second-years, are the ones who must bear the brunt of the most tedious labor; it's what we are paid to do. In a typical research group, the professor and senior Ph.D. students create the high-level project plans and then assign the junior students to grind on making all of the details work in practice. First- and second-year students are rarely able to affect the overall direction of the group's project. Even though I fully accepted my lowest rank on the pecking order, my emotional brain still took a huge beating during those first few months because the work was so damn hard and unrewarding.

~

After two months of grinding, I began to win some small victories. I got Klee working well enough to find my first few bugs in the smallest device drivers. To confirm whether those bugs were real (as opposed to false positives due to limitations of Klee), I sent emails describing each potential bug to the Linux programmers who created those drivers. Several driver creators confirmed that I had indeed found real bugs in their code. I was very excited when I received those email confirmations, since they were my first small nuggets of external validation. Even though I wasn't doing groundbreaking new research, I still felt some satisfaction knowing that my efforts led to the discovery of new bugs that were difficult to find without a tool such as Klee.

My morale improved a bit after those first few bug confirmations on the smallest device drivers, so I set my sights on trying to get Klee to work on larger, more complex drivers. However, the new technical problems that arose in subsequent weeks became unbearable and almost drove me to the point of burnout. Here is a summary of the difficulties: Klee can effectively find bugs only in software that contains less than approximately 3,000 lines of code (written in the C language). The smallest Linux device drivers contain about 100 lines of code, so they are well within Klee's capabilities. Larger drivers have about 1,000 lines of code but are intricately connected to 10,000 to 20,000 lines of code in other parts of the Linux operating system. The resulting combination is far beyond Klee's capabilities to analyze, since it's impossible for Klee to “surgically extract” the code of each device driver and analyze its 1,000 lines in isolation. I made various attempts to reduce the number of these external connections (called dependencies), but doing so required several days of intricate customized manual effort for each driver.

I met with Dawson to express my exasperation at the daunting task that I was now facing. It seemed absurd to have to spend several days to get Klee working with each new device driver. Not only was it physically wearing me out, but it wasn't even research! What would I write about in our paper—that I had spent nearly 1,000 hours of manual labor getting Klee to work on device drivers without obtaining any real insights? That wasn't a research contribution; it just sounded foolish. I also began to panic because there were only five weeks left until the paper submission deadline, and Dawson had not yet mentioned anything about our group's paper writing strategy. It usually takes at least four weeks to write up a respectable paper submission, especially when there are six students involved in the project who need to coordinate their efforts.

Several days after our meeting, Dawson came up with a plan for improving Klee to overcome the dependency problems I was facing. The new technique that he invented, called underconstrained execution (abbreviated “UC”), might allow Klee to “surgically extract” the Linux device driver code from the 10,000 to 20,000 lines of surrounding external code and thus analyze the drivers in isolation. He immediately set out to work with a senior student to incorporate the UC technique into Klee; they called the improved version Klee-UC. Even though I was exhausted and almost burned-out, I was glad that my struggles at least motivated Dawson to invent a brand-new idea with the potential to become a worthy research contribution.

Dawson and the other student spent the next few weeks working on Klee-UC. In the meantime, they told me to keep trying to find Linux device driver bugs the old-fashioned manual way. They planned to show the effectiveness of Klee-UC by re-finding the bugs that I had found manually using regular Klee. The argument they wanted to make in the paper submission was that instead of having a Ph.D. student (me!) tediously spend a few days setting up Klee to find each bug, Klee-UC could automatically find all of those bugs in a matter of minutes without any setup effort.

After grinding furiously for a few more weeks, I was ultimately able to get the original Klee to analyze 937 Linux device drivers and discover 55 new bugs (32 of which were confirmed by each respective driver's creator via email). I then had to set up the fledgling Klee-UC tool to analyze those same 937 drivers, which was even more tricky because Dawson and the other student were in the process of implementing (programming) Klee-UC while I was trying to analyze drivers with it. Thankfully, Klee-UC was indeed able to re-find most of those bugs, so at least we had some research contribution and results to write up for our paper submission.

There was one huge problem, though. By the time we got those favorable results, there were only three days left until the paper submission deadline, and nobody had even begun writing the paper yet. In that tiny amount of time, it's physically impossible to write, edit, and polish a paper submission that stands any chance of being accepted to a top-tier computer science conference. But we still tried. In the final 72 hours before the deadline, Dawson and five of us students (one had dropped out of the project by now) camped out at the office for two consecutive all-nighters to finish up the experiments and to write the paper. All of us students knew in the back of our minds that there was absolutely no way that this paper would get accepted, but we followed Dawson's lead and obediently marched forward.

We ended up submitting an embarrassing jumble of text filled with typos, nonsensical sentence fragments, graphics without any explanations, and no concluding paragraphs. It was a horrid mess. At that moment, I had no idea how I would ever complete a Ph.D. if it meant working in this terrible and disorganized manner. As expected, three months later our paper reviews came back overwhelmingly negative, filled with scathing remarks such as, “The [program committee] feels that this paper is blatantly too sloppy to merit acceptance; please do not submit papers that are nowhere near ready for review.”

~

Right after this ordeal, I applied for and accepted a summer internship at Google, since I desperately longed for a change of environment. The internship wasn't at all relevant to my research interests, but I didn't care. I just wanted to get away from Stanford for a few months.

It was now April 2007, and there were still ten weeks left until my internship started in June. I had no idea what I could work on, but I wanted to get as far away as possible from my previous four months of dealing with Klee and Linux drivers. I didn't care if we ended up never revising and resubmitting our failed paper (we didn't); I just wanted to escape from those memories. But since I had accumulated nearly 1,000 hours of experience using Klee and it was the only project Dawson cared about, I figured that it was a wise starting point for developing new project ideas. Thus, I talked to Dawson about ideas for using Klee in unconventional ways beyond simply finding bugs.

However, I quickly realized that I didn't need to be bound by Klee at all since I was funded by the NDSEG fellowship, not by Dawson's grants. In contrast, all of Dawson's other students had no choice but to continue working on Klee since they were funded by his Klee-related grants. So I kept Dawson as my advisor, but I left the Klee project and set out to create my own research project from scratch.

Why didn't I “go solo” sooner? Because even though my fellowship theoretically gave me the financial freedom to pursue whatever research direction I wanted, I knew that I still needed the support of some advisor in order to eventually graduate. Dawson was clearly interested in having all of his new students work on Klee, so I spent four months as a “good soldier” grinding on Klee rather than arrogantly demanding to do my own project from the beginning. Besides, if I had chosen another advisor, I would still need to prove myself by initially working on their projects. There was no way to avoid paying my dues.

I spent the next ten weeks daydreaming of my own research ideas in a complete vacuum without talking to anyone. Since I had such a negative initial experience working in a research group for the past few months, I now wanted to be left alone to think for myself. Dawson was fine with my absence, since he wasn't funding me through his grants.

I lived in complete isolation, mentally burned-out yet still trying to make some gradual progress. Every single day, I tried reading several computer science research papers and taking notes to get inspired to think of my own creative ideas. But without proper guidance or context, I ended up wasting a lot of time and not extracting any meaningful insights from my readings. I also rode my bicycle aimlessly in the neighborhoods around campus in futile attempts to think of new research ideas. Finally, I procrastinated more than I had ever done in my life thus far: I watched lots of TV shows, took many naps, and wasted countless hours messing around online. Unlike my friends with nine-to-five jobs, there was no boss to look over my shoulder day to day, so I let my mind roam free without any structure in my life.

Although my research brainstorming was largely unfocused, my thoughts slowly gravitated towards ideas related to the following question: How can we empirically measure the quality of software? This was one of my broad research interests prior to starting my Ph.D., inspired by my encounters with low-quality software during engineering internships. However, the problem with dreaming up ideas in a vacuum back then was that I lacked the experience necessary to turn those ideas into real research projects. Having full intellectual freedom was actually a curse, since I was not yet prepared to handle it.

Although I was interested in developing new ways to measure software quality, I acknowledged that it was only a fuzzy dream with no grounding in formal research methodologies that the academic community would deem acceptable. If I tried to pursue this project on my own, then I would be yet another quack outsider spouting nonsense. There was no way I could possibly get those ideas published in a top-tier or even second-tier conference, and if I couldn't get my work published, then I wouldn't be able to graduate. I no longer had lofty dreams of becoming a tenured professor: I just wanted to figure out some way to eventually graduate.

I hardly talked to anybody during those ten solitary weeks—not even friends or family. There was no point in complaining, since nobody could understand what I was going through at the time. My friends who were not in Ph.D. programs thought that I was merely “in school” and taking classes like a regular student. And the few friends I had made in my department were equally depressed with their own first-year Ph.D. struggles—most notably, the shock of being thrown head-first into challenging, open-ended research problems without the power to affect the high-level direction of their assigned projects. Here we were, talented young computer scientists voluntarily working on tasks that were both excruciatingly difficult and seemingly pointless, all while earning one-fourth as much salary as our friends in the corporate working world. It was so sad that it was perversely funny. However, I didn't feel that group whining would be productive, so I kept silent. I avoided coming to the Computer Science Department building to work, since I dreaded running into colleagues. I was afraid that they would inevitably ask me what I was working on, and I didn't have a respectable answer to give. Instead, I preferred hiding out in libraries and coffee shops.

In retrospect, going solo so early during grad school was a terrible decision. Contrary to romanticized notions of a lone scholar sitting outside sipping a latte and doodling on blank sheets of notebook paper, real research is never done in a vacuum. There needs to be solid intellectual, historical, and sometimes even physical foundations (e.g., laboratory equipment) for developing one's innovations. The wiser course of action during those weeks would have been to talk to Dawson more frequently, and to actively seek out collaborations with other professors or senior students. But back then, I was so burned-out and frustrated with the traditional pecking order of group-based research—putting new Ph.D. students through the meat grinder on the most unglamorous work—that I recoiled and went off on my own.

~

Towards the end of my ten weeks of isolation—right before I started my summer internship at Google—I emailed Dawson a blurb from a technical blog post I had recently read and reflected upon. That blog post made me think about measuring software quality by analyzing patterns in how programmers edit code throughout the lifetimes of software projects. To my pleasant surprise, Dawson shot back a quick reply saying that he had a side interest in these sorts of measurement techniques, especially in how they might assist automatic bug-finding tools such as Klee.

I grew hopeful when I learned about Dawson's interest in an area that I also found interesting, since he might be able to help make my ideas more substantive. I jotted down some notes on my newly-proposed empirical software measurement project with the intention of returning to it after my summer internship. Thus, I ended my first year of Ph.D. on a somewhat optimistic note after four months of traumatic Klee grinding followed by ten weeks of aimless meandering.


Next - Year Two: Inception

Download PDF version

Copyright © 2012 Philip Guo