Skip to main content

Simple Python package to compute TDigests, implemented in Rust

Project description

TDigest-rs

Simple Python package to compute TDigests, implemented in Rust.

Introduction

TDigest-rs is a Python library with a Rust backend that implements the T-Digest algorithm, enhancing the estimation of quantiles in streaming data. For an in-depth exploration of the T-Digest algorithm, refer to Ted Dunning and Otmar Ertl's paper and the G-Research blog post.

Usage

pip install tdigest-rs

The library contains a single TDigest class.

Creating a TDigest object

from tdigest_rs import TDigest

# Fit a TDigest from a numpy array (float32 or float64)
arr = np.random.randn(1000)
tdigest = TDigest.from_array(arr=arr, delta=100.0)  # delta is optional and defaults to 300.0
print(tdigest.means, tdigest.weights)

# Create directly from means and weights arrays
vals = np.random.randn(1000).astype(np.float32)
weights = np.ones(1000).astype(np.uint32)
tdigest = TDigest.from_means_weights(arr=vals, weights=weights)

Computing quantiles

# Compute a quantile
tdigest.quantile(0.1)

# Compute median
tdigest.median()

# Compute trimmed mean
tdigest.trimmed_mean(lower=0.05, upper=0.95)

Merging TDigests

arr1 = np.random.randn(1000)
arr2 = np.ones(1000)
digest1 = TDigest.from_array(arr=arr1)
digest2 = TDigest.from_array(arr=arr2)

merged_digest = digest1.merge(digest2, delta=100.0)  # delta again defaults to 300.0

Serialising TDigests

The TDigest object can be converted to a dictionary and JSON-serialised and is also pickleable.

# Convert and load to/from a python dict
d = tdigest.to_dict()
loaded_digest = TDigest.from_dict(d)

# Pickle a digest
import pickle

pickle.dumps(tdigest)

Development workflow

pip install hatch

cd bindings/python

# Run linters
hatch run dev:lint

# Run tests
hatch run dev:test

# Run benchmark
hatch run dev:benchmark

# Format code
hatch run dev:format

Contributing

Please read our contributing guide and code of conduct if you'd like to contribute to the project.

Community Guidelines

Please read our code of conduct before participating in or contributing to this project.

Security

Please see our security policy for details on reporting security vulnerabilities.

License

TDigest-rs is licensed under the Apache Software License 2.0 (Apache-2.0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tdigest_rs-0.1.0-cp37-abi3-win_amd64.whl (175.5 kB view hashes)

Uploaded CPython 3.7+ Windows x86-64

tdigest_rs-0.1.0-cp37-abi3-musllinux_1_2_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.7+ musllinux: musl 1.2+ x86-64

tdigest_rs-0.1.0-cp37-abi3-musllinux_1_2_aarch64.whl (1.2 MB view hashes)

Uploaded CPython 3.7+ musllinux: musl 1.2+ ARM64

tdigest_rs-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

tdigest_rs-0.1.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

tdigest_rs-0.1.0-cp37-abi3-macosx_11_0_arm64.whl (287.3 kB view hashes)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

tdigest_rs-0.1.0-cp37-abi3-macosx_10_12_x86_64.whl (291.9 kB view hashes)

Uploaded CPython 3.7+ macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page