Fast simple 1D and 2D histograms
Project description
About
Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy’s histogram functions are versatile, and can handle for example nonregular binning, but this versatility comes at the expense of performance.
The fasthistogram minipackage aims to provide simple and fast histogram functions for regular bins that don’t compromise on performance. It doesn’t do anything complicated  it just implements a simple histogram algorithm in C and keeps it simple. The aim is to have functions that are fast but also robust and reliable. The result is a 1D histogram function here that is 715x faster than numpy.histogram, and a 2D histogram function that is 2025x faster than numpy.histogram2d.
To install:
pip install fasthistogram
or if you use conda you can instead do:
conda install c condaforge fasthistogram
The fast_histogram module then provides two functions: histogram1d and histogram2d:
from fast_histogram import histogram1d, histogram2d
Example
Here’s an example of binning 10 million points into a regular 2D histogram:
In [1]: import numpy as np
In [2]: x = np.random.random(10_000_000)
In [3]: y = np.random.random(10_000_000)
In [4]: %timeit _ = np.histogram2d(x, y, range=[[1, 2], [2, 4]], bins=30)
935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: from fast_histogram import histogram2d
In [6]: %timeit _ = histogram2d(x, y, range=[[1, 2], [2, 4]], bins=30)
40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
(note that 10_000_000 is possible in Python 3.6 syntax, use 10000000 instead in previous versions)
The version here is over 20 times faster! The following plot shows the speedup as a function of array size for the bin parameters shown above:
<figure> </figure>as well as results for the 1D case, also with 30 bins. The speedup for the 2D case is consistently between 2025x, and for the 1D case goes from 15x for small arrays to around 7x for large arrays.
Q&A
Why don’t the histogram functions return the edges?
Computing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using numpy.linspace.
Doesn’t package X already do this, but better?
This may very well be the case! If this duplicates another package, or if it is possible to use Numpy in a smarter way to get the same performance gains, please open an issue and I’ll consider deprecating this package :)
One package that does include fast histogram functions (including in ndimensions) and can compute other statistics is vaex, so take a look there if you need more advanced functionality!
Are the 2D histograms not transposed compared to what they should be?
There is technically no ‘right’ and ‘wrong’ orientation  here we adopt the convention which gives results consistent with Numpy, so:
numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])
should give the same result as:
fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])
Why not contribute this to Numpy directly?
As mentioned above, the Numpy functions are much more versatile, so they could not be replaced by the ones here. One option would be to check in Numpy’s functions for cases that are simple and dispatch to functions such as the ones here, or add dedicated functions for regular binning. I hope we can get this in Numpy in some form or another eventually, but for now, the aim is to have this available to packages that need to support a range of Numpy versions.
Why not use Cython?
I originally implemented this in Cython, but found that I could get a 50% performance improvement by going straight to a C extension.
What about using Numba?
I specifically want to keep this package as easy as possible to install, and while Numba is a great package, it is not trivial to install outside of Anaconda.
Could this be parallelized?
This may benefit from parallelization under certain circumstances. The easiest solution might be to use OpenMP, but this won’t work on all platforms, so it would need to be made optional.
Couldn’t you make it faster by using the GPU?
Almost certainly, though the aim here is to have an easily installable and portable package, and introducing GPUs is going to affect both of these.
Why make a package specifically for this? This is a tiny amount of functionality
Packages that need this could simply bundle their own C extension or Cython code to do this, but the main motivation for releasing this as a minipackage is to avoid making purePython packages into packages that require compilation just because of the need to compute fast histograms.
Can I contribute?
Yes please! This is not meant to be a finished package, and I welcome pull request to improve things.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for fast_histogram0.11pp39pypy39_pp73manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm  Hash digest  

SHA256  c34caf613a25e9fe5d17e2f2210c15cbe52e4fd568b8e10d494a0bcb765ea72c 

MD5  124da74c4eb47b0540f5061d551882ae 

BLAKE2256  44d52ddd89be615ccd4b061eb6c0e1b13ecc11beafbac103f0eab23cf7494301 
Hashes for fast_histogram0.11pp39pypy39_pp73manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm  Hash digest  

SHA256  44f9db4a7fb046d1f5db49423264ffd73fae40b8de1eff41e527768238027c53 

MD5  e6cd3827a04061a896a8cea039b9e38d 

BLAKE2256  ddcb1d1eafb03ce02f44787d77cee30f1a7c0c5bf06c7e2fc780188c20289bb0 
Hashes for fast_histogram0.11pp38pypy38_pp73manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm  Hash digest  

SHA256  23214d4f30588748f97b2e545d12b1b63ad89878f1d59f3771f5136b17653e3d 

MD5  73012fe2c6d24ffac9f486b8c218bed8 

BLAKE2256  c97d0245e6c49a60a5e830567784eb601b9ea99ac4f447d622e5276464f68e6a 
Hashes for fast_histogram0.11pp38pypy38_pp73manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm  Hash digest  

SHA256  69754e31832b96d265faf42e3a83ae8deb4eac7033bbe7845158a43c03296554 

MD5  36f657d73b54a4f9b475c98a4bc764fc 

BLAKE2256  2fba0c49dd4f1ebe3a753b668930a6e15b33032806906da1e68696a5bb0ca254 
Hashes for fast_histogram0.11pp37pypy37_pp73manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm  Hash digest  

SHA256  8b9e68d712f78372b007cc429e25f081f9d3b9a505901bf1b7a23529bc5338f0 

MD5  cf6b4b9dc51bbb3533a1fb14eec15498 

BLAKE2256  cdd0b9b541b837bd73d65698848db73df80d468963b801248ef409e9e27ce083 
Hashes for fast_histogram0.11pp37pypy37_pp73manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm  Hash digest  

SHA256  b68485df7bb5394fab5243afa727b2011d5c79af8cd7b4d87437813dcb5cbe1d 

MD5  effaee1a4ada59cc18d685fc2e772dff 

BLAKE2256  076be9a222343083a790fdfdb08ab6f7e02bf213894a119cb4dfe82b6114fe3a 
Hashes for fast_histogram0.11cp36abi3win_amd64.whl
Algorithm  Hash digest  

SHA256  f31d6d789c993d48d328935a7be09f54fa930852319dc75222b824cfad51cfb3 

MD5  63406cba6e62012783d463a37960d203 

BLAKE2256  4007602fc050588878cd67da36f4a1f85167fe0f3c79af1c24e17e09cc419314 
Hashes for fast_histogram0.11cp36abi3win32.whl
Algorithm  Hash digest  

SHA256  17ab676338e623dd14afe0370125ad0dedd4e6e47c3b29eab117301a5f76022d 

MD5  5d2ae329a168bedf131316e0cebfca6e 

BLAKE2256  9c92e933a0a266c9ef70b7387da0bbc611b94d349b43b35f7b69a1ec52ccda5a 
Hashes for fast_histogram0.11cp36abi3musllinux_1_1_x86_64.whl
Algorithm  Hash digest  

SHA256  021373e635cf0b601b16435fbdad77ec50f155386ca2189f5a004f54942f64e2 

MD5  2e833534e6acaddb0f26a2b356dc30eb 

BLAKE2256  07b8476ce8b86b9e26ae706c6e8796814bd6fdbc19736de4cae742ac41b9f07e 
Hashes for fast_histogram0.11cp36abi3manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm  Hash digest  

SHA256  e0f7379c20c7674c6d5fd3ad5c080199596001cb8d883abd7af3e7e2557096e0 

MD5  3a88596a6723122d49adb5c50aab57ab 

BLAKE2256  f57a89a6824645614d89d027d4fc71985b344fb765cccdc03bb74f015473fef1 
Hashes for fast_histogram0.11cp36abi3manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm  Hash digest  

SHA256  12c0e3229010e52bb652b655c696baa444ee8faa271c523f1d29bdb8b13f0642 

MD5  ebe19e38b2e14d2d8c2320f6fcc0410e 

BLAKE2256  9590aa277906025dc897d9fe87062ac651e14647490773cc93b6daf6afd2b74a 
Hashes for fast_histogram0.11cp36abi3manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm  Hash digest  

SHA256  ed5b7b28b47428df5175669f49dab64d8b10be6ffa63557001fa75625959cd8a 

MD5  89832a99ac237b1372da18719b13fc08 

BLAKE2256  44cf92eda5f7ec4e03c949104f861dddb4f9032b8a2984103f0dd7aed430cfe9 
Hashes for fast_histogram0.11cp36abi3macosx_11_0_arm64.whl
Algorithm  Hash digest  

SHA256  8c1b274dcbf645150acecf3eea6cf0ff73eb53021eac55548b07e32b8999f230 

MD5  d468c1c4e5269c81bbad6a7cf93468d5 

BLAKE2256  a4353707fe018ede9c52420c86c076e45bf7086270ccd38396a0479a8cbe20cd 
Hashes for fast_histogram0.11cp36abi3macosx_10_9_x86_64.whl
Algorithm  Hash digest  

SHA256  36581f96d7949c8b2070070dacdf4edd8fc66f1cd7ae4d8f98f55d91afcf93ed 

MD5  cb0da299407673d7fcb93c5b73d291ee 

BLAKE2256  69808a12ecd4437222ca4e1a459ae735c692bf6dff90ddda812048cc320e86bf 