Tuesday, April 15, 2014

NumPy on PyPy - Status Update

Work on NumPy on PyPy continued in March, though at a lighter pace than the previous few months. Progress was made on both compatibility and speed fronts. Several behavioral issues reported to the bug tracker were resolved. The most significant of these was probably the correction of casting to built-in Python types. Previously, int/long conversions of numpy scalars such as inf/nan/1e100 would return bogus results. Now, they raise or return values, as appropriate.

On the speed front, enhancements to the PyPy JIT were made to support virtualizing the raw_store/raw_load memory operations used in numpy arrays. Further work remains here in virtualizing the alloc_raw_storage when possible. This will allow scalars to have storages but still be virtualized when possible in loops.

Aside from continued work on compatibility/speed of existing code, we also hope to begin implementing the C-level components of other numpy modules such as mtrand, nditer, linalg, and so on. Several approaches could be taken to get C-level code in these modules working, ranging from reimplementing in RPython to interfacing with existing code with CFFI, if possible. The appropriate approach depends on many factors and will probably vary from module to module.

To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.

7 comments:

Werner Beroux said...

Trying to install scipy on top gives me an error while compiling scipy/cluster/src/vq_module.c; isn't scipy yet supported?

Anonymous said...

scipy is not supported. Sometimes scipy functions are in fact in numpy in which case you can just copy the code. Otherwise you need to start learning cffi.

Yichao Yu said...

You mentioned storage and scalar types. Is it related to this bug

vak said...

what is the status about incorporating BLAS library?

Anonymous said...

How far is running Pandas on Pypy? Will it be just a recompile when Numpy is ported, or is it heavy work to port Pandas to Pypy after Numpy is done? Should I look after another solution than plan to run Pandas on Pypy?

Robert Voigtländer said...

Pandas on PyPy would indeed be very interesting for huge analysis runs.

Jami said...

Any news on the NumPy front? I check this blog for such stuff every week and also contributed to the funding drive.

I fully understand that developers skilled enough to work on such a project are hard to come by even with money, and NumPy support isn't probably the most technologically exciting aspect of PyPy.

Just even a few lines on the latest development or some milestones would show that the project is alive (although I fully understand that writing blog posts isn't everybody's favorite thing). And some kind of summary that in what shape the developers think the code is in. If you prefer coding to blogging, maybe implementing some kind of time-series graph for the numpypy-status page could be nice also (I keep checking it out but can never remember what was the state last time I checked). Maybe I can see if I can do a quick hack via eg archive.org for this.

I think also a huge boost would be to have even a hacky temporary way to interface with Matplotlib and/or SciPy, as it's quite hard to do many practical analyses without these. I'd probably try to do my analyses in such an environment and perhaps even implement/fix at least things that are my own itches. There was the 2011 hack, but it doesn't seem to be elaborated anywhere. I could live with (or even prefer, so it definitely won't become the permanent version) a ugly, slow, memory-hungry and unstable hack that would spam the stderr with insulting messages. But without any way of interfacing the existing stuff it's just too much work for the more complicated analyses.

I'm trying to track the http://bitbucket.org/pypy/numpy branch but it's a bit hard to see the bigger picture just from the commits. Even just some tags and/or meta-issues could be helpful. I'm also a bit confused on where (repo-wise) the development is actually happening. There are some sort of fresh NumPy-branches in the numpy tree. The micronumpy-project is probably dead or merged into the pypy/numpy-branch?

PS. Please don't take this as too strong criticism. I prefer to just silently code away myself too. Just what would be nice to see as somebody eagerly waiting to use Pypy in numerical stuff.