Skip to main content

Fast simple 1D and 2D histograms

Project description

CI Status asv PyPI release

About

Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy’s histogram functions are versatile, and can handle for example non-regular binning, but this versatility comes at the expense of performance.

The fast-histogram mini-package aims to provide simple and fast histogram functions for regular bins that don’t compromise on performance. It doesn’t do anything complicated - it just implements a simple histogram algorithm in C and keeps it simple. The aim is to have functions that are fast but also robust and reliable. The result is a 1D histogram function here that is 7-15x faster than numpy.histogram, and a 2D histogram function that is 20-25x faster than numpy.histogram2d.

To install:

pip install fast-histogram

or if you use conda you can instead do:

conda install -c conda-forge fast-histogram

The fast_histogram module then provides two functions: histogram1d and histogram2d:

from fast_histogram import histogram1d, histogram2d

Example

Here’s an example of binning 10 million points into a regular 2D histogram:

In [1]: import numpy as np

In [2]: x = np.random.random(10_000_000)

In [3]: y = np.random.random(10_000_000)

In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)
935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: from fast_histogram import histogram2d

In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)
40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

(note that 10_000_000 is possible in Python 3.6 syntax, use 10000000 instead in previous versions)

The version here is over 20 times faster! The following plot shows the speedup as a function of array size for the bin parameters shown above:

Comparison of performance between Numpy and fast-histogram

as well as results for the 1D case, also with 30 bins. The speedup for the 2D case is consistently between 20-25x, and for the 1D case goes from 15x for small arrays to around 7x for large arrays.

Q&A

Why don’t the histogram functions return the edges?

Computing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using numpy.linspace.

Doesn’t package X already do this, but better?

This may very well be the case! If this duplicates another package, or if it is possible to use Numpy in a smarter way to get the same performance gains, please open an issue and I’ll consider deprecating this package :)

One package that does include fast histogram functions (including in n-dimensions) and can compute other statistics is vaex, so take a look there if you need more advanced functionality!

Are the 2D histograms not transposed compared to what they should be?

There is technically no ‘right’ and ‘wrong’ orientation - here we adopt the convention which gives results consistent with Numpy, so:

numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])

should give the same result as:

fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])

Why not contribute this to Numpy directly?

As mentioned above, the Numpy functions are much more versatile, so they could not be replaced by the ones here. One option would be to check in Numpy’s functions for cases that are simple and dispatch to functions such as the ones here, or add dedicated functions for regular binning. I hope we can get this in Numpy in some form or another eventually, but for now, the aim is to have this available to packages that need to support a range of Numpy versions.

Why not use Cython?

I originally implemented this in Cython, but found that I could get a 50% performance improvement by going straight to a C extension.

What about using Numba?

I specifically want to keep this package as easy as possible to install, and while Numba is a great package, it is not trivial to install outside of Anaconda.

Could this be parallelized?

This may benefit from parallelization under certain circumstances. The easiest solution might be to use OpenMP, but this won’t work on all platforms, so it would need to be made optional.

Couldn’t you make it faster by using the GPU?

Almost certainly, though the aim here is to have an easily installable and portable package, and introducing GPUs is going to affect both of these.

Why make a package specifically for this? This is a tiny amount of functionality

Packages that need this could simply bundle their own C extension or Cython code to do this, but the main motivation for releasing this as a mini-package is to avoid making pure-Python packages into packages that require compilation just because of the need to compute fast histograms.

Can I contribute?

Yes please! This is not meant to be a finished package, and I welcome pull request to improve things.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_histogram-0.14.tar.gz (47.5 kB view details)

Uploaded Source

Built Distributions

fast_histogram-0.14-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.0 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

fast_histogram-0.14-cp39-abi3-win_amd64.whl (24.1 kB view details)

Uploaded CPython 3.9+ Windows x86-64

fast_histogram-0.14-cp39-abi3-win32.whl (21.3 kB view details)

Uploaded CPython 3.9+ Windows x86

fast_histogram-0.14-cp39-abi3-musllinux_1_1_x86_64.whl (43.6 kB view details)

Uploaded CPython 3.9+ musllinux: musl 1.1+ x86-64

fast_histogram-0.14-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (55.5 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ ARM64

fast_histogram-0.14-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (55.4 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

fast_histogram-0.14-cp39-abi3-macosx_11_0_arm64.whl (20.6 kB view details)

Uploaded CPython 3.9+ macOS 11.0+ ARM64

fast_histogram-0.14-cp39-abi3-macosx_10_9_x86_64.whl (21.6 kB view details)

Uploaded CPython 3.9+ macOS 10.9+ x86-64

File details

Details for the file fast_histogram-0.14.tar.gz.

File metadata

  • Download URL: fast_histogram-0.14.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for fast_histogram-0.14.tar.gz
Algorithm Hash digest
SHA256 390973b98af22bda85c29dcf6f008ba0d626321e9bd3f5a9d7a43e5690ea69ea
MD5 4a9bbb01431024af1c17d7c7c0deb142
BLAKE2b-256 e87704a9b4b5caa6e6b3a2f633b15dec0996c1559fc26e9ba73bb3d1d844c874

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8245d4de782304eb396581677b43ca0b9e35a0f120ecb207fb353c852c8001a8
MD5 363f3298d6c9a7c9a622809560e7517a
BLAKE2b-256 2f78abd20be4dbf5a34a38c65c13424472fbe3400a790c7cecd54023dab2fc8c

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ff9b83b0d9d489e3a59ef3b18342db7cf75f76ae22c7d95ca143783c6cc307a6
MD5 831663baa723a33792ebf593808131f3
BLAKE2b-256 fad67bdb0ea7bc96fbd633c028927f51f84982e30b08120b98193535087cc34e

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-win32.whl.

File metadata

  • Download URL: fast_histogram-0.14-cp39-abi3-win32.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: CPython 3.9+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-win32.whl
Algorithm Hash digest
SHA256 b96db6ed1db9d1ce09800e88833cc8c5e9565d44748f7bf623c0694e6cce1e2d
MD5 a6d15207f04e4fc71a2292ad80bf118e
BLAKE2b-256 9abc30658ca273e521b72faa8870dc2e5af0052d92d7e302c2ef50ab81f937cb

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f1a263da3d832e8faa10c7228b23028ac4a406d2dd7cebbe89b2d8a9a6d58a0c
MD5 b517a670a6d78d6be129da4d82721dc9
BLAKE2b-256 e86efdd53002da2c1c5f3694eb98f015728e842c2d26dd28fba618a04efadb4a

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b425d93e4bf1b0cdc223b8fe91ca68aa53c314b8ec374027b9a215a41aa85658
MD5 5665df3cf2cc71870cc10df7a6acd43f
BLAKE2b-256 0ff9524b8a302862bdc7100a5e0662d3fa49500af20badcabaddeec474819b8d

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1f2f1d4b091fa065fc1991dd10f06812cfba7549622bf63f7888ac1c8c7ed9bb
MD5 5f48da277b3c89a66dbf3fe12cd15a1e
BLAKE2b-256 503ef0dba6333dbe5c5a338d1466939c8733256a5f6d7e10615b8f96a90277e5

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 01f26dd20166040c50b5381f0a76635d81d5db9cfaaed7ec30103edf71e88c3f
MD5 b31738c9f07629e0fe9274fcfd8ae6f0
BLAKE2b-256 0c2cd4d96c78e72031f3171fb3a584b557d79d191e9bb4e93747f793c18f8623

See more details on using hashes here.

Provenance

File details

Details for the file fast_histogram-0.14-cp39-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fast_histogram-0.14-cp39-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 15876672df4831177344dfd0afbf5fd532c78f7bfca8bfabcb0f3d558f672e99
MD5 64273769da81992fd9cd1a94560ff7f5
BLAKE2b-256 eca3acf5d7641585da06982027a11727b174c4f9311c13b422111c5f197c1a57

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page