Bottleneck

Fast NumPy array functions written in Cython

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Cython
- Python
Topic
- Scientific/Engineering

Project description

Bottleneck is a collection of fast NumPy array functions written in Cython:

===================== =======================================================
NumPy/SciPy ``median, nanmedian, nanmin, nanmax, nanmean, nanstd,
nanargmin, nanargmax``
Functions ``nanvar``
Moving window ``move_nanmean``
Group by ``group_nanmean``
===================== =======================================================

Let's give it a try. Create a NumPy array::

>>> import numpy as np
>>> arr = np.array([1, 2, np.nan, 4, 5])

Find the nanmean::

>>> import bottleneck as bn
>>> bn.nanmean(arr)
3.0

Moving window nanmean::

>>> bn.move_nanmean(arr, window=2)
array([ nan, 1.5, 2. , 4. , 4.5])

Group nanmean::

>>> label = ['a', 'a', 'b', 'b', 'a']
>>> bn.group_nanmean(arr, label)
(array([ 2.66666667, 4. ]), ['a', 'b'])

Fast
====

Bottleneck is fast::

>>> arr = np.random.rand(100, 100)
>>> timeit np.nanmax(arr)
10000 loops, best of 3: 99.6 us per loop
>>> timeit bn.nanmax(arr)
100000 loops, best of 3: 15.3 us per loop

Let's not forget to add some NaNs::

>>> arr[arr > 0.5] = np.nan
>>> timeit np.nanmax(arr)
10000 loops, best of 3: 146 us per loop
>>> timeit bn.nanmax(arr)
100000 loops, best of 3: 15.2 us per loop

Bottleneck comes with a benchmark suite that compares the performance of the
bottleneck functions that have a NumPy/SciPy equivalent. To run the
benchmark::

>>> bn.bench(mode='fast')
Bottleneck performance benchmark
Bottleneck 0.2.0
Numpy (np) 1.5.1
Scipy (sp) 0.8.0
Speed is NumPy or SciPy time divided by Bottleneck time
NaN means one-third NaNs; axis=0 and float64 are used
median vs np.median
3.59 (10,10)
2.43 (1001,1001)
2.28 (1000,1000)
2.16 (100,100)
nanmedian vs local copy of sp.stats.nanmedian
102.72 (10,10) NaN
94.34 (10,10)
67.89 (100,100) NaN
28.52 (100,100)
6.37 (1000,1000) NaN
4.41 (1000,1000)
nanmax vs np.nanmax
9.99 (100,100) NaN
6.12 (10,10) NaN
5.99 (10,10)
5.88 (100,100)
1.79 (1000,1000) NaN
1.76 (1000,1000)
nanmean vs local copy of sp.stats.nanmean
25.95 (100,100) NaN
12.85 (100,100)
12.26 (10,10) NaN
11.89 (10,10)
5.15 (1000,1000) NaN
3.17 (1000,1000)
nanstd vs local copy of sp.stats.nanstd
16.96 (100,100) NaN
15.75 (10,10) NaN
15.49 (10,10)
9.51 (100,100)
3.85 (1000,1000) NaN
2.82 (1000,1000)
nanargmax vs np.nanargmax
8.60 (100,100) NaN
5.65 (10,10) NaN
5.62 (100,100)
5.44 (10,10)
2.84 (1000,1000) NaN
2.58 (1000,1000)
move_nanmean vs sp.ndimage.convolve1d based function
window = 5
19.52 (10,10) NaN
18.55 (10,10)
10.56 (100,100) NaN
6.67 (100,100)
5.19 (1000,1000) NaN
4.42 (1000,1000)

Faster
======

Under the hood Bottleneck uses a separate Cython function for each combination
of ndim, dtype, and axis. A lot of the overhead in bn.nanmax(), for example,
is in checking that the axis is within range, converting non-array data to an
array, and selecting the function to use to calculate the maximum.

You can get rid of the overhead by doing all this before you, say, enter
an inner loop::

>>> arr = np.random.rand(10,10)
>>> func, a = bn.func.nanmax_selector(arr, axis=0)
>>> func
<built-in function nanmax_2d_float64_axis0>

Let's see how much faster than runs::

>>> timeit np.nanmax(arr, axis=0)
10000 loops, best of 3: 24.9 us per loop
>>> timeit bn.nanmax(arr, axis=0)
100000 loops, best of 3: 4.97 us per loop
>>> timeit func(a)
100000 loops, best of 3: 2.13 us per loop

Note that ``func`` is faster than Numpy's non-NaN version of max::

>>> timeit arr.max(axis=0)
100000 loops, best of 3: 4.75 us per loop

So adding NaN protection to your inner loops comes at a negative cost!

Benchmarks for the low-level Cython version of each function::

>>> bn.bench(mode='faster')
Bottleneck performance benchmark
Bottleneck 0.2.0
Numpy (np) 1.5.1
Scipy (sp) 0.8.0
Speed is NumPy or SciPy time divided by Bottleneck time
NaN means one-third NaNs; axis=0 and float64 are used
median_selector vs np.median
15.29 (10,10)
14.19 (100,100)
8.04 (1001,1001)
7.32 (1000,1000)
nanmedian_selector vs local copy of sp.stats.nanmedian
352.08 (10,10) NaN
340.27 (10,10)
185.56 (100,100) NaN
138.81 (100,100)
8.21 (1000,1000)
8.09 (1000,1000) NaN
nanmax_selector vs np.nanmax
21.54 (10,10) NaN
19.98 (10,10)
12.65 (100,100) NaN
6.82 (100,100)
1.79 (1000,1000) NaN
1.76 (1000,1000)
nanmean_selector vs local copy of sp.stats.nanmean
41.08 (10,10) NaN
39.05 (10,10)
31.74 (100,100) NaN
15.24 (100,100)
5.13 (1000,1000) NaN
3.16 (1000,1000)
nanstd_selector vs local copy of sp.stats.nanstd
44.55 (10,10) NaN
43.49 (10,10)
18.66 (100,100) NaN
10.29 (100,100)
3.83 (1000,1000) NaN
2.82 (1000,1000)
nanargmax_selector vs np.nanargmax
17.91 (10,10) NaN
17.00 (10,10)
10.56 (100,100) NaN
6.50 (100,100)
2.85 (1000,1000) NaN
2.59 (1000,1000)
move_nanmean_selector vs sp.ndimage.convolve1d based function
window = 5
55.96 (10,10) NaN
50.82 (10,10)
11.77 (100,100) NaN
6.93 (100,100)
5.56 (1000,1000) NaN
4.51 (1000,1000)

Slow
====

Currently only 1d, 2d, and 3d NumPy arrays with data type (dtype) int32,
int64, float32, and float64 are accelerated. All other ndim/dtype
combinations result in calls to slower, unaccelerated functions.

License
=======

Bottleneck is distributed under a Simplified BSD license. Parts of NumPy,
Scipy and numpydoc, all of which have BSD licenses, are included in
Bottleneck. See the LICENSE file, which is distributed with Bottleneck, for
details.

URLs
====

=================== ========================================================
download http://pypi.python.org/pypi/Bottleneck
docs http://berkeleyanalytics.com/bottleneck
code http://github.com/kwgoodman/bottleneck
mailing list http://groups.google.com/group/bottle-neck
mailing list 2 http://mail.scipy.org/mailman/listinfo/scipy-user
=================== ========================================================

Install
=======

Requirements:

======================== ====================================================
Bottleneck Python, NumPy 1.4.1+
Unit tests nose
Compile gcc or MinGW
Optional SciPy 0.72+ (portions of benchmark)
======================== ====================================================

Directions for installing a *released* version of Bottleneck are given below.
Cython is not required since the Cython files have already been converted to
C source files. (If you obtained bottleneck directly from the repository, then
you will need to generate the C source files using the included Makefile which
requires Cython.)

**GNU/Linux, Mac OS X, et al.**

To install Bottleneck::

$ python setup.py build
$ sudo python setup.py install

Or, if you wish to specify where Bottleneck is installed, for example inside
``/usr/local``::

$ python setup.py build
$ sudo python setup.py install --prefix=/usr/local

**Windows**

In order to compile the C code in Bottleneck you need a Windows version of the
gcc compiler. MinGW (Minimalist GNU for Windows) contains gcc and has been used
to successfully compile Bottleneck on Windows.

Install MinGW and add it to your system path. Then install Bottleneck with the
commands::

python setup.py build --compiler=mingw32
python setup.py install

**Post install**

After you have installed Bottleneck, run the suite of unit tests::

>>> import bottleneck as bn
>>> bn.test()
<snip>
Ran 13 tests in 41.756s
OK
<nose.result.TextTestResult run=11 errors=0 failures=0>

====================================================================================================
This is an old version. Click `here <http://pypi.python.org/pypi/Bottleneck>`_ for latest version
====================================================================================================

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Cython
- Python
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

1.3.8

Feb 25, 2024

1.3.8rc5 pre-release

Feb 24, 2024

1.3.7

Mar 8, 2023

1.3.7rc1 pre-release

Jan 20, 2023

1.3.6

Jan 20, 2023

1.3.6rc1 pre-release

Jan 19, 2023

1.3.5

Jul 2, 2022

1.3.5rc2 pre-release

Jul 2, 2022

1.3.5rc1 pre-release

Jul 2, 2022

1.3.4

Feb 22, 2022

1.3.3

Feb 22, 2022

1.3.3rc14 pre-release

Feb 22, 2022

1.3.3rc13 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev6 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev5 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev4 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev1 pre-release

Feb 20, 2022

1.3.3rc2 pre-release

Feb 20, 2022

1.3.2

Feb 21, 2020

1.3.1

Nov 19, 2019

1.3.0

Nov 13, 2019

1.3.0rc2 pre-release

Nov 9, 2019

1.3.0rc1 pre-release

Nov 3, 2019

1.2.1

May 15, 2017

1.2.0

Oct 20, 2016

1.1.0

Jun 22, 2016

1.0.0

Feb 6, 2015

0.8.0

Jan 21, 2014

0.7.0

Sep 10, 2013

0.6.0

Jun 4, 2012

0.5.0

Jun 13, 2011

0.4.3

Mar 17, 2011

0.4.2

Mar 8, 2011

0.4.1

Mar 8, 2011

0.4.0

Mar 8, 2011

0.3.0

Jan 19, 2011

This version

0.2.0

Dec 27, 2010

0.1.0

Dec 1, 2010

0.1.0dev pre-release

Nov 28, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Bottleneck-0.2.0.tar.gz (566.2 kB view hashes)

Uploaded Dec 27, 2010 Source

Hashes for Bottleneck-0.2.0.tar.gz

Hashes for Bottleneck-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9081c3ea3efdbfc669e7f5a0868c127859404f9d9d5ff2045effe2984a98622b`
MD5	`4af6abd74f841e44326d5a62270a6275`
BLAKE2b-256	`d87d36ae564ffc7ce2375091ff2debf3a0d84f112d9e28f56d2e45dd9a2d78a2`