Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Sloppy Python: Using Dynamic Analysis to Automatically Add Error Tolerance to Ad-Hoc Data Processing Scripts

research paper summary
Sloppy Python: Using Dynamic Analysis to Automatically Add Error Tolerance to Ad-Hoc Data Processing Scripts. Philip J. Guo. International Workshop on Dynamic Analysis (WODA), 2011.
Programmers and data analysts get frustrated when their long-running data processing scripts crash without producing results, due to either bugs in their code or inconsistencies in data sources. To alleviate this frustration, we developed a dynamic analysis technique that guarantees scripts will never crash: It converts all uncaught exceptions into special NA (Not Available) objects and continues executing rather than crashing. Thus, imperfect scripts will run to completion and produce partial results and an error log, which is more informative than simply crashing with no results. We implemented our technique as a "Sloppy" Python interpreter that automatically adds error tolerance to existing scripts without any programmer effort or run-time slowdown.
@inproceedings{GuoSlopPy2011,
 author = {Guo, Philip J.},
 title = {{Sloppy Python}: Using Dynamic Analysis to Automatically Add Error Tolerance to Ad-hoc Data Processing Scripts},
 booktitle = {Proceedings of the Ninth International Workshop on Dynamic Analysis},
 series = {WODA '11},
 year = {2011},
 isbn = {978-1-4503-0811-3},
 location = {Toronto, Ontario, Canada},
 pages = {35--40},
 numpages = {6},
 url = {http://doi.acm.org/10.1145/2002951.2002961},
 doi = {10.1145/2002951.2002961},
 acmid = {2002961},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {data processing, fault tolerance, scripting},
}

(This summary was adapted from the SlopPy project home page, which was created in late 2010.)

SlopPy (Sloppy Python) is a modified Python interpreter that ensures your scripts will never crash.

Whenever SlopPy encounters an uncaught exception, instead of crashing the script, it will create a special NA ("Not Available") object, make that the result of the current expression, and continue executing normally. Whenever an NA object appears in an expression, SlopPy propagates it according to special rules. For example, all unary and binary operations involving NA will return NA.

SlopPy allows imperfect scripts to finish executing and produce partial results (and a log of all exceptions), which can be more informative than simply crashing at the first uncaught exception. SlopPy is a drop-in replacement for the Python 2.6 interpreter, so it should work seamlessly with all of your existing scripts and 3rd-party libraries with no run-time slowdown.

How can SlopPy be useful for me?

If you've written Python scripts that run for at least a few minutes, then you've probably encountered the following annoyance:

  1. You start executing a long-running script on your machine.
  2. You switch to working on another task or go home for the evening.
  3. When you return to check on your script, you see that it crashed at the first uncaught exception without producing any useful results.

Now you need to edit your script to fix that bug and then re-execute. It might take a few minutes to hours before your script gets past the point where it originally crashed, and then it will likely crash again with another exception. It might take a few rounds of debugging and re-executing before the script successfully finishes running and produces results.

SlopPy allows your buggy script to finish running on the first attempt, produce partial results, and show you all uncaught exceptions (not just the first one). You can always gain more insights from partial results than from no results, and you can also try to patch up all exceptions in one round of edits rather than addressing one at a time.

In sum, SlopPy allows you to write sloppy scripts in a 'quick-and-dirty' manner without worrying about error handling, which can speed up your iteration cycle when prototyping.

Can you show me a quick demo?

Sure, this 6-minute screencast demonstrates SlopPy's basic capabilities:


Read the full paper for details:

Sloppy Python: Using Dynamic Analysis to Automatically Add Error Tolerance to Ad-Hoc Data Processing Scripts. Philip J. Guo. International Workshop on Dynamic Analysis (WODA), 2011.
Programmers and data analysts get frustrated when their long-running data processing scripts crash without producing results, due to either bugs in their code or inconsistencies in data sources. To alleviate this frustration, we developed a dynamic analysis technique that guarantees scripts will never crash: It converts all uncaught exceptions into special NA (Not Available) objects and continues executing rather than crashing. Thus, imperfect scripts will run to completion and produce partial results and an error log, which is more informative than simply crashing with no results. We implemented our technique as a "Sloppy" Python interpreter that automatically adds error tolerance to existing scripts without any programmer effort or run-time slowdown.
@inproceedings{GuoSlopPy2011,
 author = {Guo, Philip J.},
 title = {{Sloppy Python}: Using Dynamic Analysis to Automatically Add Error Tolerance to Ad-hoc Data Processing Scripts},
 booktitle = {Proceedings of the Ninth International Workshop on Dynamic Analysis},
 series = {WODA '11},
 year = {2011},
 isbn = {978-1-4503-0811-3},
 location = {Toronto, Ontario, Canada},
 pages = {35--40},
 numpages = {6},
 url = {http://doi.acm.org/10.1145/2002951.2002961},
 doi = {10.1145/2002951.2002961},
 acmid = {2002961},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {data processing, fault tolerance, scripting},
}
Related pages tagged as research paper summary:
Related pages tagged as data science:
Related pages tagged as software: