Fast N-dimensional aggregation functions with Numba

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering

Project description

Numbagg: Fast N-dimensional aggregation functions with Numba

Fast, flexible N-dimensional array functions written with Numba and NumPy's generalized ufuncs.

Currently accelerated functions:

Array functions: allnan, anynan, count, nanargmax, nanargmin, nanmax, nanmean, nanstd, nanvar, nanmin, nansum
Moving window functions: move_exp_nanmean, move_mean, move_sum

Note: Only functions listed here (exposed in Numbagg's top level namespace) are supported as part of Numbagg's public API.

Easy to extend

Numbagg makes it easy to write, in pure Python/NumPy, flexible aggregation functions accelerated by Numba. All the hard work is done by Numba's JIT compiler and NumPy's gufunc machinery (as wrapped by Numba).

For example, here is how we wrote nansum:

import numpy as np
from numbagg.decorators import ndreduce

@ndreduce
def nansum(a):
    asum = 0.0
    for ai in a.flat:
        if not np.isnan(ai):
            asum += ai
    return asum

You are welcome to experiment with Numbagg's decorator functions, but these are not public APIs (yet): we reserve the right to change them at any time.

We'd rather get your pull requests to add new functions into Numbagg directly!

Advantages over Bottleneck

Way less code. Easier to add new functions. No ad-hoc templating system. No Cython!
Fast functions still work for >3 dimensions.
axis argument handles tuples of integers.

Most of the functions in Numbagg (including our test suite) are adapted from Bottleneck's battle-hardened implementations. Still, Numbagg is experimental, and probably not yet ready for production.

Benchmarks

Initial benchmarks are quite encouraging. Numbagg/Numba has comparable (slightly better) performance than Bottleneck's hand-written C:

import numbagg
import numpy as np
import bottleneck

x = np.random.RandomState(42).randn(1000, 1000)
x[x < -1] = np.NaN

# timings with numba=0.41.0 and bottleneck=1.2.1

In [2]: %timeit numbagg.nanmean(x)
1.8 ms ± 92.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit numbagg.nanmean(x, axis=0)
3.63 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit numbagg.nanmean(x, axis=1)
1.81 ms ± 41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit bottleneck.nanmean(x)
2.22 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit bottleneck.nanmean(x, axis=0)
4.45 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit bottleneck.nanmean(x, axis=1)
2.19 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Our approach

Numbagg includes somewhat awkward workarounds for features missing from NumPy/Numba:

It implements its own cache for functions wrapped by Numba's guvectorize, because that decorator is rather slow.
It does its own handling of array transposes to handle the axis argument, which we hope will eventually be directly supported by all NumPy gufuncs.
It uses some terrible hacks to hide the out-of-bound memory access necessary to write gufuncs that handle scalar values with Numba.

I hope that the need for most of these will eventually go away. In the meantime, expect Numbagg to be tightly coupled to Numba and NumPy release cycles.

License

3-clause BSD. Includes portions of Bottleneck, which is distributed under a Simplified BSD license.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.8.1

Mar 7, 2024

0.8.0

Feb 3, 2024

0.7.2

Jan 31, 2024

0.7.1

Jan 19, 2024

0.7.0

Jan 17, 2024

0.6.8

Dec 19, 2023

0.6.7

Dec 8, 2023

0.6.6

Dec 6, 2023

0.6.5

Dec 4, 2023

0.6.4

Nov 29, 2023

0.6.3

Nov 20, 2023

0.6.2

Nov 14, 2023

0.6.1

Nov 13, 2023

0.6.0

Oct 23, 2023

0.5.1

Oct 20, 2023

0.5.0

Oct 17, 2023

0.4.5

Oct 14, 2023

0.4.0

Oct 13, 2023

0.3.1

Oct 7, 2023

0.3.0

Oct 7, 2023

0.2.2

Jan 4, 2023

This version

0.2.1

May 31, 2021

0.2.0

May 31, 2021

0.1

Jan 28, 2019

0.1-dev pre-release

Oct 21, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numbagg-0.2.1.tar.gz (20.6 kB view hashes)

Uploaded May 31, 2021 Source

Built Distribution

numbagg-0.2.1-py2.py3-none-any.whl (18.9 kB view hashes)

Uploaded May 31, 2021 Python 2 Python 3

Hashes for numbagg-0.2.1.tar.gz

Hashes for numbagg-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`c9534ce94ddfe97198c2a4b8a3179329678eb7d7628abef40feb5de9a1060ce9`
MD5	`20d0920e9af30a17b8493fc2ab54a353`
BLAKE2b-256	`4f60392b5130dcf976488ec034ac0a689b167e3111105d3860ed325e712ed48f`

Hashes for numbagg-0.2.1-py2.py3-none-any.whl

Hashes for numbagg-0.2.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9f5fc42cd098075a43f04cd1977769f98f6423c9bb2cae0a3203b93c67c2f35`
MD5	`5a01f7012ae03113c324d28a038b3e6d`
BLAKE2b-256	`de0294c502051c3ecff1e5afaffa16016c532b439a9223cc882be85548b02a8e`