Email analysis scripts for mbox mailbox files
September 2006 (perspective of a master's student)
Many email programs (e.g., Mozilla Thunderbird, Eudora, Horde Webmail) store messages in files conforming to the mbox format. I have written some Python scripts to extract and organize email header information (e.g., senders, recipients, subjects, dates, etc.) from mbox files. I used these scripts to collect data for the graphs in my article, MIT: A Life in Emails. I have only used these scripts for this one application, so they aren't too thoroughly tested. Use them at your own risk!
Because I analyzed large mbox files containing thousands of messages, I broke up my analysis into two stages to improve efficiency:
I'm not enthusiastic enough right now to provide more detailed explanations and examples, because I figure that if you want to use this code, you probably have a pretty good idea of what you are doing. Best of luck in mbox hacking, and email me if you come up with any interesting analyses based on the framework provided by these scripts!