Very fast percentile calculation for integer and float dtypes via parallel histogram

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

jasper-tms

These details have not been verified by PyPI

Project description

fastpercentile: As-fast-as-possible median & percentile calculation on integer and float arrays

There's no reason why median and percentile calculations should be any slower than a np.max() call, and yet, on a 1-billion-element numpy array:

np.max                    :  0.080 seconds
np.median                 :  5.529 seconds
np.percentile             :  8.878 seconds

This package provides optimized versions of median and percentile calculations on numpy arrays, giving ~100× faster speeds than np.percentile:

fastpercentile.median     :  0.083 seconds
fastpercentile.percentile :  0.084 seconds

Click to see Python code that you can run yourself to compare the speeds on your computer

import time
import numpy as np
import fastpercentile

# A 1-billion-element uint16 volume (takes up ~2 GB of RAM)
arr = np.random.randint(0, 65536, size=1_000_000_000, dtype=np.uint16)
qs = [1, 50, 99, 99.9]
fastpercentile.median(arr)  # Compile (happens once on first call) before measuring runtimes

commands_to_run = """
np.max(arr)
np.median(arr)
np.percentile(arr, qs)
fastpercentile.median(arr)
fastpercentile.percentile(arr, qs)
"""

for command in commands_to_run.strip().splitlines():
    start = time.perf_counter()
    result = eval(command)
    print(f'{command.split("(")[0]:26s}: {time.perf_counter() - start:6.3f} seconds, result {result}')

The speed of the optimized calculation is no longer limited by CPU processing speed, but instead by data transfer bandwidth between the RAM and the CPU. Therefore, this runs literally as fast as your hardware could possibly allow.

Algorithm

For small-integer dtypes (int8, uint8, int16, uint16) the data only takes one of at most 65536 distinct values, so a single parallel pass of counting how many times each value occurs (that is, building a histogram of value occurrences) gives everything needed to compute any percentile. After the histogram is built, walking the cumulative count to find the bin holding each requested rank costs essentially nothing. The whole thing is a few hundred lines of numba, and additional memory usage is only ~16 MB regardless of input size (32 threads × one 65536-bin local table each, plus a final reduced histogram), so it adds no measurable RAM pressure on top of the input.

Click to read how we run this algorithm 2 or 4 times in a row to handle 32-bit or 64-bit integers, respectively, without using much additional memory.

For 32- and 64-bit integers, a direct histogram over all possible values is infeasible (it would need 2**32 or 2**64 8-byte bins — 32 GB or ~150 exabytes), so they use a radix refinement instead: the first pass computes histograms on only the top 16 bits of each value, which localizes each requested percentile to a coarse bucket; later passes re-scan the array but only count the next 16 bits of the few elements inside those buckets, narrowing the answer 16 bits at a time. This costs one pass per 16-bit digit — two for 32-bit input, four for 64-bit — and keeps auxiliary memory at the same fixed 65536-bin scale. 32-bit arrays are ~4× slower and 64-bit arrays are ~16× slower to process than 16-bit data, which is still much faster than np.percentile. (Note that for 64-bit values above 2**53, the result carries the same float64 rounding error that numpy.percentile also has.)

Click to read how the same approach extends to float32 and float64.

IEEE 754 floating-point numbers have an order-preserving bijection to unsigned integers of the same width: reinterpret the raw bits as an unsigned integer, then set the sign bit for non-negative values and flip every bit for negative values. After that transform, plain unsigned ordering is exactly numeric float ordering, so the same radix refinement used for 32- and 64-bit integers resolves float percentiles too — the original value is recovered bit-exactly at the end and the linear interpolation is done in float64. NaNs are detected and skipped while the histogram is built (and counted, so percentile can propagate them like numpy while nanpercentile ignores them). float32 runs at the same speed as int32 (a 2-digit radix) and float64 at the speed of int64 (a 4-digit radix), so floats are slower than the single-pass 16-bit case but still several times faster than numpy.percentile (roughly 10× for float32 and 3× for float64 on large arrays).

There is one deliberate difference from numpy.percentile on non-finite data: when a requested rank is integral (no interpolation needed) but a neighboring value is infinite, numpy.percentile still evaluates its interpolation term and returns NaN (0 * inf), whereas fastpercentile returns the exact order statistic — which is what numpy.median returns in the same situation. This is the same spirit as the narrow-signed-integer case above, where we avoid a numpy overflow.

Usage

import numpy as np
import fastpercentile

arr = np.random.randint(0, 65536, size=(305, 96, 69, 846), dtype=np.uint16)

# A scalar percentile
p99 = fastpercentile.percentile(arr, 99)

# The median (50th percentile)
m = fastpercentile.median(arr)

# Multiple percentiles in a single pass over the data
p1, p50, p99, p99_9 = fastpercentile.percentile(arr, [1, 50, 99, 99.9])

# Reduce along one or more axes, just like numpy.  Pass a single int,
# a tuple of ints, or None (the default, which reduces everything).
per_frame = fastpercentile.percentile(arr, 99, axis=(1, 2, 3))  # shape (305,)
keep = fastpercentile.percentile(arr, 50, axis=0, keepdims=True)  # shape (1, 96, 69, 846)

# Floats work too (float32 and float64).  By default any NaN
# propagates, just like numpy; use the nan-aware variants to ignore
# NaNs instead.
floats = np.random.standard_normal((1000, 1000)).astype(np.float32)
floats[floats > 3] = np.nan
m = fastpercentile.median(floats)                 # NaN, because NaNs are present
m = fastpercentile.nanmedian(floats)              # ignores the NaNs
p = fastpercentile.nanpercentile(floats, [25, 75], axis=1)  # shape (2, 1000)

# Or just grab the histogram if you want to do something else with it.
# (Only works for 8-bit and 16-bit values because 32-bit and 64-bit
# histograms don't fit in memory.)
hist = fastpercentile.histogram(arr)  # length 65536 for uint16

Results match numpy.percentile(arr, q, axis=axis, keepdims=keepdims) with the default 'linear' interpolation method (typically exact for integer inputs). In fact, on narrow signed dtypes numpy.percentile can overflow during its interpolation and return a value outside the data range; fastpercentile interpolates in float64 and stays correct.

When you reduce along an axis, the work is parallelized across the resulting slices (or within each slice when there are only a few), so reductions like a per-frame percentile over an image stack stay fast. And for arrays small enough that a full histogram would be mostly empty, fastpercentile simply sorts instead, so it never pays the histogram's fixed cost on a tiny input.

Installation

Option 1: pip install from PyPI:

pip install fastpercentile

Option 2: pip install directly from GitHub:

pip install git+https://github.com/jasper-tms/fastpercentile.git

Option 3: First git clone this repo and then pip install it from your clone:

cd ~/repos
git clone https://github.com/jasper-tms/fastpercentile.git
cd fastpercentile
pip install .

Notes on threading

fastpercentile uses every logical core on the machine by default (via numba.get_num_threads()). To limit it for a particular call, pass n_threads=N; to set it globally, use numba.set_num_threads(N) or the NUMBA_NUM_THREADS environment variable. On most systems the workload saturates DRAM bandwidth around nproc / 2 threads, so reserving a few cores for the rest of the machine costs little throughput.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

jasper-tms

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Jun 26, 2026

0.3.0

Jun 26, 2026

0.2.3

Jun 22, 2026

0.2.2

Jun 22, 2026

0.2.1

Jun 22, 2026

0.2.0

Jun 22, 2026

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastpercentile-1.0.0.tar.gz (29.2 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastpercentile-1.0.0-py3-none-any.whl (19.3 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file fastpercentile-1.0.0.tar.gz.

File metadata

Download URL: fastpercentile-1.0.0.tar.gz
Upload date: Jun 26, 2026
Size: 29.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastpercentile-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ef38580540669a36f964bf515bd457ad7c5e0791cb94b96748a74ed04744f062`
MD5	`eaac1c3c35b6cf3cd707c0f3db6155d3`
BLAKE2b-256	`350c315e2b2faefddeafb08d8db34fb4a687368541da4c643d63b5a14c2de2ab`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastpercentile-1.0.0.tar.gz:

Publisher: publish.yml on jasper-tms/fastpercentile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fastpercentile-1.0.0.tar.gz
- Subject digest: ef38580540669a36f964bf515bd457ad7c5e0791cb94b96748a74ed04744f062
- Sigstore transparency entry: 1971235277
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: jasper-tms/fastpercentile@667226eeba63b953e08d70097c1ffeca9f8cdc73
- Branch / Tag: refs/heads/main
- Owner: https://github.com/jasper-tms
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@667226eeba63b953e08d70097c1ffeca9f8cdc73
- Trigger Event: push

File details

Details for the file fastpercentile-1.0.0-py3-none-any.whl.

File metadata

Download URL: fastpercentile-1.0.0-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 19.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastpercentile-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aba89b752784aa37cdb528b1d461475c9d5c7b6cafca906b372a293f7df03107`
MD5	`e988b0726914a50d5da15c409cc59735`
BLAKE2b-256	`0ae6324aa3defda0aeb629e32cb456a61cd06f40dee1b1fc057682701adb9489`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastpercentile-1.0.0-py3-none-any.whl:

Publisher: publish.yml on jasper-tms/fastpercentile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fastpercentile-1.0.0-py3-none-any.whl
- Subject digest: aba89b752784aa37cdb528b1d461475c9d5c7b6cafca906b372a293f7df03107
- Sigstore transparency entry: 1971235338
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: jasper-tms/fastpercentile@667226eeba63b953e08d70097c1ffeca9f8cdc73
- Branch / Tag: refs/heads/main
- Owner: https://github.com/jasper-tms
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@667226eeba63b953e08d70097c1ffeca9f8cdc73
- Trigger Event: push

fastpercentile 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

fastpercentile: As-fast-as-possible median & percentile calculation on integer and float arrays

Algorithm

Usage

Installation

Notes on threading

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance