Skip to main content

Faster loops for NumPy using multithreading and other tricks

Project description


Parallel NumPy seamlessly speeds up NumPy for large arrays (64K+ elements) with no change required to your existing NumPy code.

PNumPy supports Linux, Windows, and MacOS for NumPy >= 1.18 for python 3.6, 3.7, 3.8, and 3.9.

This first release speeds up NumPy binary and unary ufuncs such as add, multiply, isnan, abs, sin, log, sum, min and many more. Sped up functions also include: sort, argsort, lexsort, arange, boolean indexing, and fancy indexing. In the near future we will speed up: astype, where, putmask, and searchsorted.

Other packages that use numpy, such as scikit-learn or pandas, will also be sped up for large arrays.

CI Status

License: MIT


pip install pnumpy

You can also install the latest development versions with

pip install


See the full documentation

To use the project:

import pnumpy as pn

Parallel NumPy speeds up NumPy silently under the hood. To see some benchmarks yourself run


plot plot

To get a partial list of functions sped up run


To disable or enable pnumpy run


Additional Functionality

PNumPy provides additional routines such as converting a NumPy record array to a column major array in parallel (pn.recarray_to_colmajor) which is useful for DataFrames. Other routines include pn.lexsort32, which performs an indirect sort using np.int32 instead of np.int64 consuming half the memory and running faster.


PNumPy uses a combination of threads and 256 bit vector intrinsics to speed up calculations. By default most operations will only use 3 additional worker threads in combination with the main python thread for a total 4. Large arrays are divided up into 16K chunks and threads are assigned to maintain cache coherency. More threads are dynamically deployed for more intensive CPU problems like np.sin. Users can customize threading. The example below shows how 4 threads can work together to quadruple the effective L2 cache size.


To cap the number of additional worker threads to 3 run


To disable or re-enable threading run


To disable or re-enable just the atop engine run



Q: If I type np.sort(a) where a is an array, will it be sped up?

A: If len(a) > 65536 and pnumpy has been imported, it will automatically be sped up

Q: How is sort sped up?

A: PNumPy uses additional threads to divide up the sorting job. For example it might perform an 8 way quicksort followed by a 4 way mergesort

Q: How is scikit or pandas sped up?

A: PNumPy's vector loops and threads will speed up any package that uses large NumPy arrays


To run all the tests run:

python -m pip install pytest
python -m pytest tests

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pnumpy-2.0.23-cp36-abi3-win_amd64.whl (415.0 kB view hashes)

Uploaded CPython 3.6+ Windows x86-64

pnumpy-2.0.23-cp36-abi3-manylinux2010_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.6+ manylinux: glibc 2.12+ x86-64

pnumpy-2.0.23-cp36-abi3-manylinux1_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.6+

pnumpy-2.0.23-cp36-abi3-macosx_10_14_x86_64.whl (363.9 kB view hashes)

Uploaded CPython 3.6+ macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page