Skip to main content

Python implementation of the iopsystems h2 histogram, interoperable with Rezolus

Project description

h2histogram-py

A pure-Python implementation of the iopsystems h2 histogram.

h2histogram produces histograms with byte-for-byte identical bucketing to the Rust histogram crate, so histograms recorded here can be consumed by Rezolus — and, conversely, you can open a Parquet/Arrow column of h2histogram values produced by Rezolus and analyze it in Python.

What is an h2 histogram?

An h2 histogram quantizes values into buckets using two parameters:

  • grouping_power — the number of buckets spanning each power of two. It sets the relative error to 2^-grouping_power (e.g. grouping_power=7 → ~0.78% error).
  • max_value_power — the largest representable value is 2^max_value_power - 1.

Values below 2^(grouping_power+1) are stored exactly (linear buckets of width 1); larger values fall into logarithmic buckets. This gives HDR-histogram-like guarantees with a simpler, faster bucket index computation. Rezolus records histograms with grouping_power=3 and max_value_power=64.

Install

pip install h2histogram            # core library (no dependencies)
pip install h2histogram[parquet]   # + pyarrow, for the Arrow/Parquet interop
pip install h2histogram[numpy]     # + numpy, for a vectorized bulk-record fast path

For local development from a checkout:

pip install -e ".[dev]"
pytest

Quick start

from h2histogram import Histogram

h = Histogram(grouping_power=7, max_value_power=64)
h.increment(42)
h.record(1000, count=5)
h.record_many([12, 15, 900, 1_000_000])   # bulk (uses numpy if available)

print(h.total_count())          # 8
p99 = h.percentile(0.99)        # a Bucket
print(p99.range, p99.midpoint)  # ((..lo.., ..hi..), midpoint estimate)

# Combine / reduce
merged = h.merge(other_h)       # element-wise sum (also: h + other_h)
coarse = h.downsample(4)        # fewer buckets, higher error, same total count
sparse = h.to_sparse()          # columnar (index, count) form for storage

Fast repeated quantile queries

For a snapshot you'll query many times, convert to a CumulativeHistogram (the crate's CumulativeROHistogram). It stores non-zero buckets with cumulative counts, so percentiles are answered with a binary search, and it precomputes a midpoint-estimated mean:

c = h.to_cumulative()           # read-only; also SparseHistogram.to_cumulative()
c.percentile(0.99)              # O(log n) binary search -> Bucket (individual count)
c.mean()                        # midpoint-estimated mean, computed once
c.bucket_quantile_range(0)      # (lower, upper) quantile fraction of a stored bucket
for bucket, lo, hi in c.iter_with_quantiles():
    ...                         # each non-zero bucket with its quantile span

Reading histograms from a Rezolus Parquet file

Rezolus writes one row per sample interval. Histogram metrics are stored as a dense "{metric}:buckets" column, or a sparse "{metric}:bucket_indices" / "{metric}:bucket_counts" pair — all List<UInt64>.

from h2histogram.arrow import histogram_columns, read_histograms

# Discover histogram metrics in the file
for col in histogram_columns("rezolus.parquet"):
    print(col.name, col.kind)   # e.g. "syscall/read/latency standard"

# Read a metric's time series: one Histogram per row (None for missing rows)
series = read_histograms("rezolus.parquet", "syscall/read/latency")

for i, hist in enumerate(series):
    if hist is not None:
        print(i, hist.percentile(0.99).midpoint)

# Aggregate the whole recording
total = series[0]
for hist in series[1:]:
    if hist is not None:
        total = total.merge(hist)
print("overall p99:", total.percentile(0.99).midpoint)

The bucketing config is resolved from (in order): an explicit config=/grouping_power= argument, grouping_power/max_value_power recorded in the field metadata, inference from a dense column's bucket count, and finally the Rezolus defaults (grouping_power=3, max_value_power=64).

Writing a Rezolus-compatible file

from h2histogram.arrow import write_histograms

write_histograms(
    "out.parquet",
    {"syscall/read/latency": series},   # {metric_name: [Histogram, ...]}
    timestamps=timestamps_ns,           # optional; one per row
    histogram_type="standard",          # or "sparse"
)

Files written this way match the metriken/Rezolus column layout and additionally record grouping_power/max_value_power in the field metadata so they are fully self-describing on read.

See the runnable examples in examples/:

API overview

Type Purpose
Config Bucketing parameters; value_to_index, index_to_range, total_buckets, error
Histogram Dense histogram; increment, record, record_many, percentile(s), merge, subtract, downsample, to_sparse, to_cumulative, from_buckets
SparseHistogram Columnar (index, count) form; from_histogram, from_parts, to_dense, to_cumulative
CumulativeHistogram Read-only cumulative form (crate's CumulativeROHistogram); binary-search percentile(s), mean, bucket_quantile_range, iter_with_quantiles
Bucket A bucket's count and inclusive [start, end] range, plus midpoint/width
h2histogram.arrow Read/write the Rezolus Arrow/Parquet layout

Correctness

The bucketing math is verified against the exact assertions from the Rust crate's own unit tests (src/config.rs), and the NumPy bulk-record fast path is checked against the scalar path across the full u64 range. Run pytest to see for yourself.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h2histogram-0.1.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h2histogram-0.1.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file h2histogram-0.1.0.tar.gz.

File metadata

  • Download URL: h2histogram-0.1.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for h2histogram-0.1.0.tar.gz
Algorithm Hash digest
SHA256 537f517523f41656f93e7ac887d62812080b77c6e2712c44db0dc8fbffc06b70
MD5 23ceb9c139bcb42ee801bd3cba7b6c40
BLAKE2b-256 48cc7faecd6fc466ea56207c907cda20ba340c83fc77ddb993e08d7b9a382fa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for h2histogram-0.1.0.tar.gz:

Publisher: release.yml on iopsystems/h2histogram-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file h2histogram-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: h2histogram-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for h2histogram-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ee03444fc8015b5bfd20ebdb3eb0b3652dabefd918520edbf779c1340483201
MD5 9aa38f911b73e6e3429e617b8086f781
BLAKE2b-256 e6c7099467017b646361a75e36d2b1cfd29e35f15e15dcf2ba3797706ec1148a

See more details on using hashes here.

Provenance

The following attestation bundles were made for h2histogram-0.1.0-py3-none-any.whl:

Publisher: release.yml on iopsystems/h2histogram-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page