Very fast percentile calculation for small-integer dtypes via parallel histogram
Project description
fastpercentile: Memory-bandwidth-bound percentile for small-integer arrays
np.percentile is O(n) but in practice has a brutal constant factor — on a 1.7-billion-element uint16 volume it takes ~22 s, vs ~0.15 s for np.max. There is no good reason for percentile to be so much slower than max: both can be done in a single pass over the array.
For small-integer dtypes (int8, uint8, int16, uint16) the data only takes one of at most 65 536 distinct values, so a single parallel pass into a histogram captures everything you need to compute any percentile. After the histogram is built, walking the cumulative count to find the bin holding each requested rank costs essentially nothing.
This package implements that, in a few hundred lines of numba. On a 32-thread workstation it runs at DRAM bandwidth — about as fast as np.max, and ~300× faster than np.percentile:
np.max : 0.148 s
fastpercentile : 0.072 s <- four percentiles in one pass
np.percentile : 22.2 s
Auxiliary memory is ~16 MB regardless of input size (32 threads × one 65 536-bin local table each, plus a final reduced histogram), so it adds no measurable RAM pressure on top of the input.
Usage
import numpy as np
import fastpercentile
arr = np.random.randint(0, 65536, size=(305, 96, 69, 846), dtype=np.uint16)
# A scalar percentile
p99 = fastpercentile.percentile(arr, 99)
# Multiple percentiles in a single pass over the data
p1, p50, p99, p99_9 = fastpercentile.percentile(arr, [1, 50, 99, 99.9])
# Or just grab the histogram if you want to do something else with it
hist = fastpercentile.histogram(arr) # length 65536 for uint16
Results match numpy.percentile(arr, q) with the default 'linear' interpolation method (typically exact for integer inputs).
Supported dtypes
int8, uint8, int16, uint16. Floats and 32/64-bit integers are not supported because a direct histogram is not feasible for them — for those, use numpy.percentile or bottleneck.nanpercentile.
Memory layout
A 4D array loaded by pynrrd is typically Fortran-contiguous. fastpercentile walks raw memory, so it does not care about C vs F order — both are handled as a no-copy view. Arbitrarily-strided arrays fall back to a copy.
Installation
Option 1: pip install from PyPI:
pip install fastpercentile
Option 2: pip install directly from GitHub:
pip install git+https://github.com/jasper-tms/fastpercentile.git
Option 3: First git clone this repo and then pip install it from your clone:
cd ~/repos
git clone https://github.com/jasper-tms/fastpercentile.git
cd fastpercentile
pip install '.[dev]'
Notes on threading
fastpercentile uses every logical core on the machine by default (via numba.get_num_threads()). To limit it for a particular call, pass n_threads=N; to set it globally, use numba.set_num_threads(N) or the NUMBA_NUM_THREADS environment variable. On most systems the workload saturates DRAM bandwidth around nproc / 2 threads, so reserving a few cores for the rest of the machine costs little throughput.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastpercentile-0.1.0.tar.gz.
File metadata
- Download URL: fastpercentile-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bba6e69d98bb6ee4f4c3489cb77f156b19f4571d1471b59359e3756d903754e
|
|
| MD5 |
1e9ae56c0ae351200381c38b43ac8088
|
|
| BLAKE2b-256 |
f56c836e3243d9112ec74e9edc74a32539b7a9087469710b9f78d453adada62e
|
File details
Details for the file fastpercentile-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fastpercentile-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfa78ea325844e000ede7623d1e50233983a3b92d11cf8713cfe2b449ecdffbc
|
|
| MD5 |
fddce044201e3e616df974ea5e34584a
|
|
| BLAKE2b-256 |
e604614d6269f2098342b981dfcf284792d60613db3ed33ac09ce6c61a5fa151
|