Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Data Science Workflow: Overview and Challenges

During my Ph.D., I created tools for people who write programs to obtain insights from data. Millions of professionals in fields ranging from science, engineering, business, finance, public policy, and journalism, as well as numerous students and computer hobbyists, all perform this sort of programming on a daily basis.

Shortly after I wrote my dissertation in 2012, the term "Data Science" started appearing everywhere. Some industry pundits call data science the "sexiest job of the 21st century." And universities are putting tremendous funding into new Data Science Institutes.

I now realize that data scientists are one main target audience for the tools that I created throughout my Ph.D. However, that job title was not as prominent back when I was in grad school, so I didn't mention it explicitly in my dissertation.

What do data scientists do at work, and what challenges do they face?

This post provides an overview of the modern data science workflow, adapted from Chapter 2 of my Ph.D. dissertation, Software Tools to Facilitate Research Programming.


[Read the rest of this article at the CACM blog]

Created: 2013-10-30
Last modified: 2013-10-30
Related pages tagged as research:
Related pages tagged as programming: