IncPy: Automatic memoization for Python

What is IncPy?

IncPy (Incremental Python) is an enhanced Python interpreter that speeds up script execution times by automatically memoizing (caching) the results of long-running function calls and then re-using those results rather than re-computing, when safe to do so.

When you first run your script with the IncPy interpreter, it might be ~20% slower since IncPy needs to determine which functions are safe and worthwhile to memoize, what their dependencies are, and then memoize their results to a persistent cache. After you make some edits to your script, subsequent runs can be much faster since IncPy can skip calls to functions whose dependencies are still satisfied and load their results directly from the cache. That way, IncPy can greatly speed up your iteration and debugging cycles.

IncPy is designed to be a drop-in replacement for the Python 2.6 interpreter, so it should work seamlessly with all of your existing scripts and 3rd-party libraries. You don't need to learn any new language features or programming idioms to get its benefits.

How can IncPy be useful for me?

If you've written Python scripts that run for at least a few minutes, then you've probably encountered the following dilemma:

  1. Simple code, but slow runs: If you keep your code relatively simple, then it takes unnecessarily long to re-execute after you make minor edits to your code, since the Python interpreter re-executes your entire script, even those parts that have not been affected by your edits.
  2. Complicated code + temp data files, but faster runs: If you write extra caching code to save intermediate results to disk (and later load them from disk), then subsequent runs of your script can be much faster. However, now your code is more complicated, and you need to manage those temporary data files that your script has generated.

As your project progresses, you might end up writing a collection of ad-hoc scripts, each reading some input files, munging the data, and writing intermediate results out to temporary files that other scripts then read and munge.

For instance, the above diagram shows the Makefile that I created during a summer internship in which I wrote dozens of Python scripts to munge software bug database and employee personnel data. Each rectangle represents a Python script, each ellipse represents a data file (shaded ellipses represent the final results of my analyses), and each arrow shows a script either reading from or writing to a data file. To speed up execution times, I re-factored my scripts to load and save several layers of intermediate datasets (white ellipses), so that I could tweak portions of my analyses and not have to wait for the entire workflow to re-execute. As a consequence, my code got more bloated, and I also had to keep track of over a dozen intermediate data files. I realized from this experience that an enhanced Python interpreter could automatically do all of this caching and dependency management, so that's when I set out to create IncPy.

By running your scripts with IncPy rather than the regular Python interpreter, you can keep your code simple while still getting the benefits of faster execution times. In particular:

How does IncPy differ from other approaches to memoization?

Can you show me a quick demo?

Sure, this 6-minute screencast demonstrates some of IncPy's basic capabilities:

Can you show me a small code example?

Here's an example data analysis script and a graphical representation of data flow through its functions:

MULTIPLIER = 3  # global variable

# each call runs for 1 minute
def processQueries(queryFilename)
  dat = []
  conn = openDatabase('master.db')
  for sqlQuery in open(queryFilename):
    queryRes = conn.query(sqlQuery)
    dat.append(queryRes)
  return dat

def calculateStats(lst):
  res = ... # run for 1 minute calculating stats
  return res * MULTIPLIER

def transformAndOutputStats(statsLst, outFilename):
  outf = open(outFilename, 'w')
  for e in statsLst:
    newStats = ... # run for 1 minute
    outf.write(newStats)
  outf.close()
 

for i in range(10):
  inFilename = 'queries.' + str(i) + '.txt'
  queriesDat = processQueries(inFilename)
  stats = calculateStats(queriesDat)
  outFilename = 'output.' + str(i) + '.txt'
  transformAndOutputStats(stats, outFilename)

The inputs to this script are 10 text files containing database queries (named queries.0.txt through queries.9.txt), and its outputs are corresponding files named output.0.txt through output.9.txt. The initial run of the script takes 30 minutes (1 minute for each function call x 3 calls per input file x 10 files).

During the initial run, IncPy automatically memoizes the arguments, return values, and dependencies for all invocations of the 3 functions. With the cache now populated, subsequent runs can be much faster than the original 30 minutes. For instance:

How can I learn more about IncPy?

This conference paper reflects the state of IncPy circa late-2010:

Our earlier workshop paper reflects the state of IncPy circa December 2009:

For my research, I'm actively looking for new users to evaluate the effectiveness of IncPy, so I'd be happy to create a custom installation for your machine and to provide technical support. Feel free to email me, Philip Guo, at:

if you have any questions or requests.


Some gory details:

  1. Installation
  2. Getting started
  3. How IncPy works
  4. Limitations
  5. Helping my research
  6. Appendix: FAQ

How can I download and install IncPy?

IncPy is a modified version of the Python 2.6.3 interpreter. I've successfully installed IncPy on Mac OS X (10.4 and 10.6) and Linux (Ubuntu 8.04 LTS). Unfortunately, it might not work on Windows since it makes some POSIX system calls (but I haven't tried yet, so it might actually work). I want to make it easy for people to start using IncPy, but I haven't yet had time to create reliable one-click installers for all supported operating systems.

Compiling IncPy from source code

To get the most recent version of IncPy, you must download and compile its source code. If you don't want to go through this hassle, please send me an email at:

and I will try my best to compile a custom version for your computer and to guide you through the setup process.

The IncPy source code resides in a public GitHub code repository. You can check out the latest copy and compile using these commands:

git clone git://github.com/pgbovine/IncPy.git
cd IncPy
./configure
make

The configure step creates a new incpy.config configuration file in your home directory (if one doesn't already exist). You can use that file to customize IncPy's functionality.

Dependencies

Mac OS X: If you install the 'Xcode developer tools' and 'X11' packages from your installation DVD, then you should have all of the software required to compile IncPy. It's also a good idea to install the GNU readline library before compiling IncPy, so that your Python interactive prompt acts more pleasant.

Linux: The software needed to compile IncPy might already come pre-installed, but in case they're not, here are some useful packages to install (these names are for Debian-based distros, but it should be easy to look up the corresponding names in other package management systems):

sudo apt-get install libc6-dev g++ gcc libreadline-dev

It's normal for warning messages like this one to appear when you're compiling Python:

Failed to find the necessary bits to build these modules:
_bsddb             bsddb185           dbm
dl                 gdbm               imageop
sunaudiodev
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

It just means that certain Python modules cannot be compiled for your machine, but as long as you see an executable named python (or python.exe on Mac OS X) in the IncPy directory, the build was successful.

Running IncPy for the first time

After a successful compile, there should be an executable named python (or python.exe on Mac OS X) in the IncPy directory. When you execute that program, you should see an interactive Python prompt like the following:

$ IncPy/python.exe 
IncPy: An auto-memoizing Python interpreter that enables incremental recomputation
Created by Philip Guo (pg@cs.stanford.edu)

Python 2.6.3 (r263:75183, Apr 19 2010, 20:58:42) 
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Working with 3rd-party libraries

IncPy is designed to work seamlessly with all 3rd-party libraries, extensions, and tools (e.g., NumPy, SciPy, matplotlib, IPython), as long as they are compatible with Python 2.6. You shouldn't need to re-compile any libraries or extension code.

All you need to do is to set the PYTHONPATH environment variable so that IncPy knows where your libraries and extensions are installed (alternatively, you can prepend the path onto the sys.path variable from within your Python script).

You can install 3rd-party libraries in a variety of ways, but if you're affiliated with a university, I highly recommend downloading a free academic version of the Enthought Python Distribution. It's a fantastic one-click installer containing Python 2.6 and over 75 useful libraries.

After installing the Enthought Python Distribution on my Mac OS X 10.6 computer, I can give IncPy access to all of its installed libraries by setting PYTHONPATH to the appropriate location and then starting up IncPy:

export PYTHONPATH=/Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/site-packages
~/IncPy/python.exe

For faster performance, it's a good idea to add these libraries to your ignore list in incpy.config along with the standard library:

# ignoring Python standard library
ignore = /Users/pgbovine/IncPy/Lib/

# ignoring Enthought Python Distribution library code
ignore = /Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/site-packages/

If you're on a 64-bit machine and want to compile a 32-bit x86 IncPy binary (e.g., to interoperate with already-installed 32-bit 3rd-party libraries), you can run this modified configure command before compiling:

# for Mac OS X:
./configure CC="gcc -arch i386" CXX="g++ -arch i386"

# for Linux:
./configure CC="gcc -m32" CXX="g++ -m32"

Please let me know if you have troubles getting 3rd-party libraries working with IncPy.

How do I get started using IncPy?

There's no new user interface to learn! Just run the Python executable that you compiled in the IncPy/ sub-directory. IncPy should behave like a regular Python interpreter, except that it will automatically memoize the results of long-running functions to disk and re-use those results in subsequent runs rather than re-computing them.

Toy example

For example, suppose you wrote the following toy script called analysis.py:

def mungeFile(filename):
  dat = {}
  for line in open(filename):
    person, count = line.split(':')
    if person not in dat:
      dat[person] = 0
    dat[person] += int(count)
  return dat  

res = mungeFile('students.txt') # runs for 1 minute

When you run this script for the first time, you must wait for 1 minute for the mungeFile function to finish processing students.txt and return a dictionary to its caller. At that time, IncPy memoizes the argument and return value of mungeFile to disk, storing them in a sub-directory called incpy-cache/. IncPy also creates a log file called incpy.log in your current directory, with the following contents:

=== 2010-04-21 16:04:21 START | TIME_LIMIT 1 sec | IGNORE []
MEMOIZED mungeFile [analysis.py] | runtime 60000 ms
=== 2010-04-21 16:05:21 END

The only event in the log is that the mungeFile function ran for 1 minute (60,000 milliseconds) and had its results memoized. (I'll explain what the TIME_LIMIT and IGNORE fields are in the next section.)

When you run this same script again, it will terminate almost instantly since IncPy can skip the call to mungeFile and load its results directly from incpy-cache/. The incpy.log for this run looks like:

=== 2010-04-21 16:15:29 START | TIME_LIMIT 1 sec | IGNORE []
SKIPPED mungeFile [analysis.py] | lookup time 0 ms | original runtime 60000 ms
=== 2010-04-21 16:15:29 END

This log shows that mungeFile was skipped and that it took 0 milliseconds to look-up and retrieve its results from the cache. If this function had returned a larger data structure, then the look-up time would likely be higher (but you still get a performance improvement as long as the look-up time doesn't exceed the original running time).

At this time, you can add additional code after the call to mungeFile (e.g., for plotting the data or doing further analysis), and that code will get to run immediately rather than having to wait for 1 minute for mungeFile to re-execute every time you run the script.

Ok, let's say that you modify the file students.txt and then re-run the script. Unfortunately, IncPy must now delete the memoized results for mungeFile, since they depend on the original contents of students.txt and are probably now incorrect since students.txt has changed. Thus, this run will again take a full minute, and its incpy.log will look like this:

=== 2010-04-21 16:56:17 START | TIME_LIMIT 1 sec | IGNORE []
FILE_READ_DEPENDENCY_BROKEN mungeFile [analysis.py] | students.txt changed
MEMOIZED mungeFile [analysis.py] | runtime 60000 ms
=== 2010-04-21 16:57:17 END

If you want to manually clear the entire cache, you can simply delete the incpy-cache/ sub-directory. (I have yet to add support for clearing individual cache entries.)

The incpy.log in your current working directory only contains information about the most recent run. In addition, IncPy combines the log files from all runs into a single incpy.aggregate.log file in your home directory.

The incpy.config configuration file

IncPy looks for a configuration file in your home directory named incpy.config (and won't run if it doesn't find one). Here are the options you can specify in that file:

Here is an example incpy.config file:

# ignoring Python standard library
ignore = /Users/pgbovine/IncPy/Lib/

# 5-second time limit:
time_limit = 5

Note that if you specify a directory to ignore, then code in all files in that directory and in all sub-directories are ignored.

Annotating functions

For finer-grained control over memoization, IncPy also allows you to annotate individual functions by specifying options in their docstrings (a string literal that appears at the beginning of the function body). Here are the annotations it currently supports:

incpy.memoize

To force IncPy to always memoize calls to a particular function, put the string incpy.memoize in its docstring. For example, we can implement the cliched Fibonacci sequence example using this annotation:

def fib(n):
  '''incpy.memoize'''
  if n == 0 or n == 1:
    return 1
  else:
    return fib(n-1) + fib(n-2)

Normally, the fib function will not be memoized, since each call takes far less than 1 second, and IncPy only memoizes calls that take a macroscopic amount of time to complete (at least time_limit seconds, as specified in incpy.config). However, the incpy.memoize annotation forces it to be memoized. Note that even impure functions bearing this annotation will be memoized.

incpy.ignore

In addition to ignoring entire files or directories (by specifying ignore lines in incpy.config), you can ignore individual functions by including the string incpy.ignore in its docstring. For example:

def func_to_ignore(x, y):
  '''This function produces too much junk to memoize

  incpy.ignore

  docstring can contain anything else too, lalala'''
  z = create_giant_list_from(x, y)
  return z

IncPy will not track dependencies or purity in ignored functions and will never memoize their return values.

One situation where you might want to ignore a function is if its return value is some huge data structure that's not worth memoizing (since it takes too long to memoize and also takes up too much disk space).

incpy.no_output

Sometimes your long-running functions will print out a bunch of debugging or 'progress bar'-style output to stdout or stderr, but you really don't want to capture all of that output in your cache; all you care about is their return values. If you annotate your functions with incpy.no_output, then IncPy won't track stdout/stderr contents:

def process_file():
  '''incpy.no_output'''
  total = 0
  for line in open('data.txt'):
    print line # debugging output that you don't care to memoize
    total += parse_and_analyze(line)

  return total

The first time you run this function, all lines in the data.txt input file will be printed to stdout (presumably for debugging or to track progress through the file). But when you re-run this function, the stdout output will not be 'replayed' (in fact, they were never saved to the cache); only the return value will be retrieved from the cache.

How does IncPy work?

This section provides a brief (and somewhat-simplified) overview of how IncPy works. The input to IncPy is a Python script (and optional customizations in incpy.config), and its outputs are the results of running that script, memoized data in the incpy-cache/ sub-directory, and log files (incpy.log and incpy.aggregate.log).

What functions are worthwhile to memoize?

While your script is executing, IncPy records how much time each function invocation takes. Whenever a function takes longer than time_limit to run (1 second by default), IncPy will attempt to memoize its results. Thus, the vast majority of function invocations will not be memoized, since it's probably faster just to re-execute them rather than to save and load their results from the cache.

In rare circumstances, it might take longer to memoize a function invocation than to simply re-run it (e.g., if the function returns a large object that takes a long time to pickle and save to disk). If that occurs, then the function invocation will not be memoized; IncPy will instead log a warning message to incpy.log, so that you can choose to have IncPy ignore that function in the future.

What functions are safe to memoize?

IncPy will only attempt to memoize function invocations that are side effect free and deterministic.

A function invocation is side effect free (a.k.a. pure) if it and all functions it calls never mutate a value that existed prior to its invocation (e.g., global variables and parameter contents). IncPy can automatically detect when a function violates this condition and mark it as impure and thus ineligible for memoization.

Here is an example of a function with side effects:

def munge_and_mutate(lst):
  result = ... # spend 1 minute processing lst and return a number
  lst.append(result)
  return result

In addition to returning a result, this function also mutates its input parameter lst. If IncPy were to skip this function and simply load its return value from the cache, then the mutation wouldn't be properly replayed. Thus, as soon as IncPy executes the append call, it marks munge_and_mutate as impure and ineligible for memoization because it mutated a value (lst) that existed prior to its invocation.

An example of a non-deterministic function is one that queries a random number generator or the system clock. If such a function were skipped and its return value loaded from the cache, it would probably be incorrect. It's difficult to automatically detect non-determinism, so IncPy must be given a list of functions known to be non-deterministic. Currently I hard-code a list containing a few standard library functions; in the future, I plan to make that list customizable via incpy.config.

The one kind of non-deterministic function that IncPy can automatically detect is one that opens stdin; IncPy marks all functions on the stack as impure when stdin is opened, since user input is definitely non-deterministic.

What data is stored in the cache?

Each function gets its own cache (currently implemented as a sub-directory of pickle files within the incpy-cache/ directory). While a function is executing, IncPy records what it (and all functions it calls) prints to stdout and stderr. When it finishes executing, if it is still eligible for memoization, IncPy will save the following data in a new cache entry, stored as a pickle file:

How is the cache used to speed up future executions?

In a future call to that same function (either in the same script execution or during a future execution), if IncPy finds an entry in the cache that matches the values of arguments and global variables, then it will skip the call, print out the saved stdout and stderr contents, advance the seek offsets of all read files to their final locations, and return the saved return value to its caller. This perfectly emulates the original function invocation, except that it runs much faster. However, if IncPy cannot find a cache entry that matches the argument and global variable values, then it will simply execute the function normally (and create a new cache entry upon completion).

How are cache entries automatically deleted?

IncPy automatically deletes a cache entry when one of its dependencies gets broken, because the stored data is likely incorrect. If a file that the function has read or written has changed (indicated by modification time), then the cache entry will be deleted. If the bytecode for that function or any of its callers have changed, then all cache entries for that function are deleted.

If you want to manually clear the entire cache (like a 'make clean'), you can simply delete the incpy-cache/ sub-directory. (I have yet to add support for clearing individual cache entries, though.)

Trusting previously-cached results

If you invoke IncPy with the -T option, then it will never delete a function's cache, even when its dependencies have been broken (it will instead issue a warning to stderr and to incpy.log).

This "trust previously-cached results" mode is useful when you know that the code changes you just made should not affect the previously-cached results. For example, say you're writing a script to sequentially process N records in a dataset. Your script runs fine until it crashes on a record i somewhere in the middle of your dataset, since that record contains data that your script doesn't properly handle. With IncPy, the results from processing records 1 through (i - 1) have been memoized to disk, so if you re-run your script, it can just re-use those results. But since your script actually crashed, you will definitely modify it before re-running (to fix the bug). However, once your code has changed, IncPy must invalidate the cache entries for processing records 1 through (i - 1), since those results might no longer be valid. Thus, your script must start running again from record 1, which gives you no time savings. Using the -T option, though, IncPy simply trusts the previously-cached results, which lets your script skip the first (i - 1) records and resume processing at record i.

When you're first writing a new ad-hoc data processing script, it will likely crash at least a few times on records somewhere in the middle of your dataset due to quirks in the data format (sometimes after running fine for minutes or even hours). With this option, you can fix bugs and resume processing at the first failed record rather than always back at the beginning, which can eliminate lots of waiting time.

What are some of IncPy's limitations?

User-defined classes need to override == with something other than the default pointer equality test

IncPy makes extensive use of the Python == operator for comparing memoized argument and global variable values. If you want user-defined classes to work well with IncPy, then make sure each contains a valid __eq__ or __cmp__ method based on something other than pointer equality.

This isn't much of a limitation in practice, though. Overriding == is good programming style anyways, and even if you don't do it, then IncPy will still work fine but simply miss some opportunities for re-using memoized results.

Unpicklable data cannot be memoized

Since IncPy uses the Python pickle format to serialize data to disk, it cannot memoize functions whose arguments, return values, or global variable dependencies contain unpicklable data.

In reality, though, pretty much all Python data types that you might care to memoize can be pickled. One way to get around this limitation if it arises is by creating picklable proxy objects and then writing code to convert between the proxy and real objects. For convenience, IncPy automatically creates proxies for file handles, function objects, and sqlite cursor objects.

Cannot track dependencies or purity within non-Python extension code

IncPy cannot track dependencies in functions implemented in other languages (e.g., C or Fortran). Also, it cannot determine whether these functions are pure. These limitations are shared by any analysis that works purely on Python code. Fortunately, lots of non-Python extension functions (e.g., those in math libraries) are pure and have no external dependencies (library code is often pure and self-contained, or else they could get awkward to use).

The only practical way around this limitation is to annotate functions to indicate which arguments and global variables they mutate. I've started annotating some standard library functions, and in the future I plan to allow the user to make annotations in incpy.config.

Cannot track dependencies or purity within spawned non-Python sub-programs

IncPy only works on Python code, so if you launch sub-programs written in other languages, then IncPy cannot track what happens within those programs. However, if those sub-programs are written in Python (e.g., using the multiprocessing module) and run under IncPy, then of course it's possible to track what happens within them. (On certain operating systems, one could imagine augmenting IncPy with a utility like strace or DTrace to determine which files are read/written by spawned sub-programs originating from any programming language.)

How can I help you with your research?

Thanks for being so considerate; I thought you'd never ask!

IncPy is an active research project, so I'm currently looking for users to try it out and to give me feedback, complaints, and feature requests via email at:

If you end up using IncPy regularly in your work, the only piece of data that I'd like from you is the incpy.aggregate.log file in your home directory. This is a plain-text file that indicates how much time you saved by using IncPy rather than re-running your entire script after every edit. IncPy does not collect or transmit any information about you, your scripts, or your datasets.


Appendix: Grubby technical FAQ

Why didn't IncPy do what I expected it to do?

The first file you should investigate is the incpy.log file that IncPy creates in the current working directory after every invocation. The exact format of that log file is still in flux, but it should provide some insights into what IncPy did during its most recent invocation. Please feel free to email me at:

if you need me to explain what some of the log entries mean.

Some of my function calls aren't getting memoized because they call library code that perform supposedly-impure actions

Oftentimes library code perform actions that are technically impure (e.g., mutating a global variable) but are actually pure from the perspective of your script. For example, the built-in Python regular expression library keeps an internal cache of already-compiled regexps and mutates that cache whenever a new regexp is compiled; however, the act of compiling a regexp is a conceptually-pure operation.

To ignore all impure actions in library code so that your functions can be memoized, add the absolute path of the library's file(s) or directories to your incpy.config file as an 'ignore' option, like so:

# ignoring Python standard library
ignore = /Users/pgbovine/IncPy/Lib/

# ignoring Enthought Python Distribution library code
ignore = /Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/site-packages/

What should I do if IncPy can't memoize one of my function's arguments?

If an object doesn't properly override == or is unpicklable, then IncPy cannot memoize it (see the limitations sub-section). If the object belongs to a class you defined, then you can simply augment the class to fix this problem. However, if the object is from an extension library (e.g., CvMat objects in OpenCV), then you can't easily override == or make it picklable. One easy (but kludgy) workaround is to convert your object into one that can be memoized, pass that 'proxy object' as the function's argument, and then inside the function, convert it back into the original type (e.g., one user had to convert an OpenCV CvMat into a Python list, pass it into his function, and then convert the list back into a CvMat). Although this process is inefficient, it's worthwhile if the memoization benefits outweigh the conversion times.

IncPy seems to be taking a long time to memoize certain functions or to load their results from disk

IncPy uses the Python pickle mechanism to serialize/deserialize objects so that they can be stored on-disk. When IncPy memoizes a function, it must pickle its arguments, return value, and values of global variables that it has read. In general, the larger and more complex these objects are, the longer they will take to pickle (and unpickle), not to mention the more disk space they will use. Also, large and complex objects are more likely to not even be picklable, which forces IncPy to give up on trying to memoize the enclosing functions!

Thus, for optimal performance, I recommend to refactor your code so that the minimum amount of data needs to be pickled. For example, in this sub-optimal code snippet ...

imgBytes       = LoadImage('mona-lisa.bmp')
colorHistogram = getImageHistogram(imgBytes) # runs for 10 seconds

The call to getImageHistogram should be memoized, since it ran for 10 seconds. However, its argument (imgBytes) could be quite large since it represents the binary data of an entire image (perhaps it could be 10MB or even 100MB in size). Thus, the memo table entry would be at least the size of imgBytes, which can really slow down IncPy.

Instead, if we refactored the above code to wrap the desired functionality in an additional function ...

def getHistogramFromFilename(filename):
  imgBytes = LoadImage(filename)
  return getImageHistogram(imgBytes) # runs for 10 seconds

colorHistogram = getHistogramFromFilename('mona-lisa.bmp')

then when IncPy now tries to memoize getHistogramFromFilename, it will be much faster since its argument is now simply a string (rather than a 10-100 MB blob of binary image data).

undefined symbol: _PyUnicodeUCS4_IsWhitespace error in NumPy

If this happens when you try to import NumPy:

Python 2.6.3 (r263:75183, Jul  2 2010, 20:39:01)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
Traceback (most recent call last):
 File "", line 1, in 
 File "/home/jonas/incpy/lib/python2.6/site-packages/numpy/__init__.py",
line 132, in 
   import add_newdocs
 File "/home/jonas/incpy/lib/python2.6/site-packages/numpy/add_newdocs.py",
line 9, in 
   from lib import add_newdoc
 File "/home/jonas/incpy/lib/python2.6/site-packages/numpy/lib/__init__.py",
line 4, in 
   from type_check import *
 File "/home/jonas/incpy/lib/python2.6/site-packages/numpy/lib/type_check.py",
line 8, in 
   import numpy.core.numeric as _nx
 File "/home/jonas/incpy/lib/python2.6/site-packages/numpy/core/__init__.py",
line 5, in 
   import multiarray
ImportError: /home/jonas/incpy/lib/python2.6/site-packages/numpy/core/multiarray.so:
undefined symbol: _PyUnicodeUCS4_IsWhitespace

Then it's likely to be a Unicode problem (NumPy Unicode using a different byte size for Unicode characters than IncPy); I've been told that configuring IncPy with the following option and then re-compiling solves the problem:

./configure --enable-unicode=ucs4

(Thanks to Eric Jonas for this tip)


IncPy is created and maintained by Philip Guo

Last updated: 2010-10-04