Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Python programming idioms

Summary
Here are some idioms that I regularly use when programming in Python. I've tested all of these idioms in Python 2.4. Let me know if you have any suggestions or bug reports.

Strings

Replacements for simple regexps

Python has no syntactic sugar for defining and using regular expressions, so I often use these predicates as lightweight replacements:

>>> s = "Hello, my friend"
>>> s.startswith('Hello')   # emulates '^Hello'
True
>>> s.endswith('world')     # emulates 'world$'
False
>>> 'my' in s
True
>>> s.lower()
'hello, my friend'
>>> s.lower().startswith('hello') # case-insensitive
True

Interpolating variables within strings

Python doesn't allow you to interpolate variables within strings like Perl/PHP/sh does (e.g., "hi, my name is $NAME"), but you can emulate this feature by using a syntax similar to sprintf for C:

>>> NAME = 'Philip'
>>> # pardon the ugly comma syntax for singleton tuple
>>> 'hi, my name is %s' % (NAME,) 
'hi, my name is Philip'
>>> AGE = 25
>>> # %s for string, %d for integer, etc., like sprintf
>>> "hi, i'm %s and i'm %d years old" % (NAME, AGE)
"hi, i'm Philip and i'm 25 years old"

Dictionaries

Dict constructor using string keys

A convenient way to construct a dict where the keys are strings:

>>> d = dict(name="James", age=25, weight=165.5)
>>> d
{'age': 25, 'name': 'James', 'weight': 165.5}

Dict constructor using list of tuples

A convenient way to construct a dict out of a list of two-element tuples, where each first element becomes a key and each second element becomes its corresponding value:

>>> l = [('name','James'),('age',25),('weight',165.5)]
>>> d = dict(l)
>>> d
{'age': 25, 'name': 'James', 'weight': 165.5}

Dicts as histograms

You can use a dict with integer values as a counter (histogram) for how many times a particular element shows up in a list:

>>> counts = {}
>>> names = ['Joe', 'Bill', 'Joe', 'Judy', 'Judy'] 
>>> for n in names:
...     if n not in counts:
...             counts[n] = 0
...     counts[n] += 1
... 
>>> print counts
{'Judy': 2, 'Bill': 1, 'Joe': 2}

Now, you might find it ugly to initialize counts[n] = 0 for names that haven't yet been added to counts. If you use Python 2.5 or above, then you can use defaultdict to construct a dict whose default value is the integer 0, thus allowing you to forgo that initialization step:

>>> from collections import defaultdict
>>> counts = defaultdict(int) # default int value is 0
>>> names = ['Joe', 'Bill', 'Joe', 'Judy', 'Judy'] 
>>> for n in names:
...     counts[n] += 1
... 
>>> print counts
defaultdict(, {'Judy': 2, 'Bill': 1, 'Joe': 2})

Sets for efficient membership tests

Before the set datatype was introduced, Python programmers used dicts with unused values to store unique keys and to test for membership.

Nowadays, sets enable you to efficiently test for membership using the in operator, and also conveniently removes duplicates:

>>> s = set()
>>> s.add('hello')
>>> s.add('world')
>>> s
set(['world', 'hello'])
>>> s.add('hello')
>>> s
set(['world', 'hello'])
>>> 'hello' in s
True
>>> 'world' in s
True
>>> 'bye' in s
False

Of course, you can perform typical set theoretic operations (e.g., union, intersection, subset) as well.

Iteration

Unpacking tuples during iteration

When iterating through a collection (e.g., list) of tuples, one would typically write

>>> for elt in coords_lst:
...     x = elt[0]
...     y = elt[1]
...     # do something with x and y

where each element elt is a tuple with x as the first element and y as the second element. Here is a more concise alternative:

>>> for (x, y) in coords_lst:
...     # do something with x and y

When iterating over coords_lst, each tuple element is unpacked into x and y components. Another added benefit is that a run-time error will be raised if any element of coords_lst isn't a two-element tuple.

You can also do this for nested data structures:

>>> for (photo_name, (width, height)) in photos_lst:
...     # do something with photo_name, width, height

Here each element is a pair where the first component is photo_name and the second component is itself a pair of width and height.

Iterating with indices

If you miss old-fashioned for loops using integer indices, here is a simple way to get them back using enumerate:

>>> l = ['a', 'b', 'c', 'd', 'e']
>>> for (i, elt) in enumerate(l):
...     print i, elt
... 
0 a
1 b
2 c
3 d
4 e

enumerate will generate a zero-based integer index alongside each element in the list (or other iterable object) you pass it. It's more efficient than emulating a C for loop like so:

>>> for i in range(5):
...     print i, l[i]
... 
0 a
1 b
2 c
3 d
4 e

Iterating over dict keys and values in parallel

Use iteritems to iterate over all keys and values of a dict in parallel:

>>> counts = {'Judy': 2, 'Bill': 1, 'Joe': 2}
>>> for (name, count) in counts.iteritems():
...     print name, count
... 
Judy 2
Bill 1
Joe 2

Parallel and staggered iteration

Use zip to iterate over several lists in parallel:

>>> names = ['John', 'Mark', 'Jen', 'Sarah']
>>> ages = [24, 32, 14, 30]
>>> for (name, age) in zip(names, ages):
...     print name, 'is', age, 'years old'
... 
John is 24 years old
Mark is 32 years old
Jen is 14 years old
Sarah is 30 years old

When the lists are of different length, zip stops after the shortest list ends.

zip can take more than two lists:

>>> registered = [True, False, False, True]
>>> for (name,age,r) in zip(names, ages, registered):
...     print '%s (%d) registered: %d' % (name,age,r)
... 
John (24) registered: 1
Mark (32) registered: 0
Jen (14) registered: 0
Sarah (30) registered: 1

For iterating over adjacent elements of a single list ('staggered' iteration), pass a slice of the list itself to zip:

>>> l = [10, 15, 17, 18, 17, 14]
>>> l
[10, 15, 17, 18, 17, 14]
>>> l[1:]  
[15, 17, 18, 17, 14]
>>> for (cur,next) in zip(l, l[1:]):
...     delta = next - cur
...     print next, '-', cur, '=', delta
... 
15 - 10 = 5
17 - 15 = 2
18 - 17 = 1
17 - 18 = -1
14 - 17 = -3

The above snippets calculates differences between adjacent elements of the list.

Printing

Printing to a file or other stream

print is very convenient because it's polymorphic, judiciously inserts spaces, and terminates with a newline. Wouldn't it be nice to be able to redirect the output of print to a file or another stream? Here is how to do so:

>>> f = open('out.txt','w')
>>> print >> f, 'hello world'
>>> print >> f, 'my age is', 25
>>> f.close()

After specifying the output stream and adding a comma, you can write the rest of the print line as usual. The above looks much cleaner than the alternative:

>>> f = open('out.txt','w')
>>> f.write('hello world\n')
>>> f.write('my age is ' + str(25) + '\n')
>>> f.close()

Here is how to print to stderr:

>>> import sys
>>> print >> sys.stderr, 'hello world'

Sorting

Creating a new sorted list

Use sorted to create a new sorted list out of another list:

>>> l = ['Laura', 'Joey', 'Bobby', 'Jake', 'Al']
>>> sorted(l)
['Al', 'Bobby', 'Jake', 'Joey', 'Laura']
>>> # original list unaffected
>>> l
['Laura', 'Joey', 'Bobby', 'Jake', 'Al']

sorted works with any object that supports iteration (called an iterable), like a set:

>>> s = set([4,3,1,5,8])
>>> s
set([8, 1, 3, 4, 5])
>>> sorted(s)
[1, 3, 4, 5, 8]

In-place stable sort using custom keys

When sorting a list whose elements aren't primitives (e.g., a list of tuples or dicts), you often want to sort by comparing one particular component (called a key) of each element. For instance, given a list where each element is a pair of names and ages:

>>> l = [('Pat',25),('Jane',21),('Sam',17),('Al',17)]

you might want to sort by either name or age. To sort (in-place) by name, use the optional key parameter to sort:

>>> l.sort(key=lambda e : e[0])
>>> l
[('Al', 17), ('Jane', 21), ('Pat', 25), ('Sam', 17)]

The lambda defines an anonymous function taking one parameter and returning the component to use as the key in the sort (here, e[0]).

To sort by age, supply e[1] as the key:

>>> l = [('Pat',25),('Jane',21),('Sam',17),('Al',17)]
>>> l.sort(key=lambda e : e[1])
>>> l
[('Sam', 17), ('Al', 17), ('Jane', 21), ('Pat', 25)]

Since the sort is stable, it will leave the ordering of identical elements alone when doing subsequent sorts. This can be useful when you want to sort by multiple keys. Let's say you wanted to transform list l so that it's sorted by age, and then alphabetically by name when their ages are identical. To do so, first sort by name (secondary key), then by age (primary key):

>>> l = [('Pat',25),('Jane',21),('Sam',17),('Al',17)]
>>> l.sort(key=lambda e : e[0])
>>> # sorted alphabetically by name
>>> l
[('Al', 17), ('Jane', 21), ('Pat', 25), ('Sam', 17)]
>>> l.sort(key=lambda e : e[1])
>>> # sorted by age, with ties broken alphabetically
>>> l
[('Al', 17), ('Sam', 17), ('Jane', 21), ('Pat', 25)]

Note that you need to perform the sorts in ascending order of key priority (lowest priority first), which might initially seem counter-intuitive (see LSD Radix Sort).

Other

Two-ended numeric comparisons

When doing numeric comparisons, you can write the following:

>>> if 18 <= age < 25:

instead of the more verbose alternative:

>>> if (18 <= age) and (age < 25):

Programmatically accessing global and local variables

You can access the values of global variables and local variables (within a function) as string keys in dicts.

Globals:

>>> name = 'Philip'
>>> age = 25
>>> globals()
{'__builtins__': , 
 '__name__': '__main__', 
 'age': 25, 
 '__doc__': None, 
 'name': 'Philip'}

Locals:

>>> def foo():
...     x = 5
...     y = 'python rules'
...     print locals()
... 
>>> foo()
{'y': 'python rules', 'x': 5}

The primary advantage of accessing variables' values using string dict keys is that you can generate those strings programmatically (e.g., by concatenation).

Output and parse human-readable data files

The Python pickle module is useful for serializing data to save in files, but unfortunately pickled files are not human-readable.

Sometimes I want my Python scripts to print out human-readable plaintext files, but also to allow other Python scripts to easily parse those files to do further processing (without writing custom parsing code).

One trick for doing so is to directly print out Python data structures (i.e., lists, tuples, dicts) to a file, and then use eval to parse the text into data structures. This trick takes advantage of the fact that Python outputs and accepts data structures in the exact same textual format. (The JSON file format is based on this concept.)

For example, say that you dumped a bunch of tuples to a file in the following format (saved as people.txt):

('Alex', 24, True)
('Joey', 18, False)
('Jane', 37, True)
('Barbara', 18, True)

That files looks pretty human-readable to me; it looks almost like a .csv (comma-separated values) file. It's easy to parse this file using eval:

>>> for line in open('people.txt'):
...     x = eval(line)
...     print 'Name:', x[0], 'Age:', x[1], x[2]
... 
Name: Alex Age: 24 True
Name: Joey Age: 18 False
Name: Jane Age: 37 True
Name: Barbara Age: 18 True

eval conveniently turns each line into a tuple with 3 elements.

If you want your fields to be named, then print out dicts:

{'name': 'Alex', 'age': 24, 'registered': True}
{'name': 'Joey', 'age': 18, 'registered': False}
{'name': 'Jane', 'age': 37, 'registered': True}
{'name': 'Barbara', 'age': 18, 'registered': True}
>>> for line in open('people.txt'):
...     d = eval(line)
...     print d['name'], d['age'], d['registered']
... 
Alex 24 True
Joey 18 False
Jane 37 True
Barbara 18 True

Dynamically reloading modules

I like to write the following string at the bottom of all of my Python source files (use the """ notation to define a multi-line string):

"""
import myfile; from myfile import *
reload(myfile); from myfile import *
"""

replacing myfile with the filename of the particular Python file (without the .py extension).

Whenever I want to test out functions I've defined in that file in the Python interactive prompt, I can simply copy-and-paste the first line into the prompt:

>>> import myfile; from myfile import *

This will load all the functions (and global variables) in myfile.py into the prompt and allow me to call them directly using their real names (without using the myfile. prefix). Doing so allows me to test and debug my functions in an interactive prompt.

The real power comes from running the second line:

>>> reload(myfile); from myfile import *

This will reload the updated contents of myfile.py so that I can call my new versions of these functions. Thus, I can quickly test out improved versions of my functions within the prompt while I edit them in myfile.py. You can run that reload line as many times as you like during a session.

This programming style of interleaving coding with interactive execution can be extremely productive when, say, you are testing a function that expects a complex input. For instance, if I have a function that takes as input a huge data structure that takes 15 minutes to construct, I don't want to wait 15 minutes between each cycle of testing my function. I can spend 15 minutes constructing that data structure within an interactive prompt, then use the reload trick to incrementally improve, test, and debug my function without having to re-construct that data structure over and over again.

In addition, see Code Like a Pythonista: Idiomatic Python for a more authoritative and detailed collection of idioms.

Created: 2007-11-10
Last modified: 2009-03-21
Related pages tagged as programming: