Tips for performing computationally-based experimental research
September 2008 (perspective of a Ph.D. student)
In this article, I describe tips relevant to researchers, especially Ph.D. students, who perform any type of research involving the use of computers to run experiments. I aim to describe concrete tips relevant to helping make the daily grind of performing computationally-based experimental research more productive, effective, and enjoyable.
In December 2017, I dug up this incomplete article draft from my old documents archive. I wrote this draft at the start of my 3rd year of Ph.D. nearly a decade ago. I think the overall principles remain relevant today, even if some of the technical details are old-ish. If I were writing this article today, I'd talk more about using Jupyter Notebooks, the modern R tidyverse ecosystem, GitHub for version control, and Dropbox as an always-on backup and version control system, but those technologies weren't in common use in 2008.
On a meta note, I found it striking that so early on in grad school, I was already reflecting heavily on the challenges of research programming that I was personally facing. These struggles would directly inform the design of the five tools that eventually formed my Ph.D. dissertation a few years later.
Finally, a lot of this is relevant to data science, except of course that word wasn't in common use back in 2008!
TODO: link to more concrete examples (give at least one for each bullet point)
Motivation for this article
In many areas of modern science and engineering, researchers who are not trained as computer scientists or programmers find themselves having to use computer-based tools to perform their research. While they are often experts in their respective fields, they are not experts at computer systems or programming (nor should anyone expect them to be), so they often do not know about the most effective ways to leverage the power of computers to aid in their research. This article describes a collection of tips I have accumulated from doing computationally-based experimental research over the past few years, which aims to help other researchers make better use of computational resources in their work.
While doing web searches for related articles prior to writing this one, I actually found none that provided the type of advice that I wanted to present. I found many online advice guides for how to excel as a Ph.D. student or researcher, but those mostly focused on high-level 'career advice' such as how to develop research ideas, apply for grants, publish papers, collaborate in teams, communicate and sell projects, etc. This article is definitely not one of these 'Ph.D. student advice' guides (it would be presumptuous and non-credible for me to write such a guide since I'm still in the midst of my Ph.D.). Instead, I aim to focus solely on helping people optimize aspects of the daily grind spent sitting in front of the computer.
Experimental research involves lots of trial-and-error --- exploration of paths that are mostly unfruitful. Thus, it's important to document lessons learned during trials in order to maximize chances of success in the future and to reduce the probability of repeated mistakes. Even more important than writing down notes is the ability to find the relevant notes later when you need them.
Automating routine tasks
A sight that greatly frustrates me is seeing my friends who are extremely smart people spend lots of time doing repetitive grungy tasks on the computer, especially when grinding on research. I know that their time is far more valuable than the computer's time. Computers are great at performing well-specified, boring, repetitive tasks, and don't grow jaded from the drudgery. Humans are meant to do smart things, and computers are meant to do dumb things.
If you find yourself repeating yourself enough on the computer, it's time to seriously consider learning how to write programs (commonly called scripts) to automate those tasks. If you aren't sure whether your particular task can be automated, chances are that it can, so ask the nearest computer expert around you about how to do it.
Long-running computational jobs
Organizing experimental data
Asking your local computer experts for help
Optimizing your use of time
Data backup and version control
Example of a computational tool suite
These are computational tools that I often use in my own research.