Distributed quantile sketches
Project description
ddsketch
This repo contains the Python implementation of the distributed quantile sketch
algorithm DDSketch [1]. DDSketch has relative-error guarantees for any quantile
q in [0, 1]. That is if the true value of the qth-quantile is x
then DDSketch
returns a value y
such that |x-y| / x < e
where e
is the relative error
parameter. (The default here is set to 0.01.) DDSketch is also fully mergeable,
meaning that multiple sketches from distributed systems can be combined in a
central node.
Our default implementation, DDSketch
, is guaranteed [1] to not grow too large
in size for any data that can be described by a distribution whose tails are
sub-exponential.
We also provide implementations (LogCollapsingLowestDenseDDSketch
and
LogCollapsingHighestDenseDDSketch
) where the q-quantile will be accurate up to
the specified relative error for q that is not too small (or large). Concretely,
the q-quantile will be accurate up to the specified relative error as long as it
belongs to one of the m
bins kept by the sketch. If the data is time in
seconds, the default of m = 2048
covers 80 microseconds to 1 year.
Installation
To install this package, run pip install ddsketch
, or clone the repo and run
python setup.py install
. This package depends on numpy
and protobuf
. (The
protobuf dependency can be removed if it's not applicable.)
Usage
from ddsketch import DDSketch
sketch = DDSketch()
Add values to the sketch
import numpy as np
values = np.random.normal(size=500)
for v in values:
sketch.add(v)
Find the quantiles of values
to within the relative error.
quantiles = [sketch.get_quantile_value(q) for q in [0.5, 0.75, 0.9, 1]]
Merge another DDSketch
into sketch
.
another_sketch = DDSketch()
other_values = np.random.normal(size=500)
for v in other_values:
another_sketch.add(v)
sketch.merge(another_sketch)
The quantiles of values
concatenated with other_values
are still accurate to within the relative error.
References
[1] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ddsketch-1.1.2.tar.gz
.
File metadata
- Download URL: ddsketch-1.1.2.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3faa9cd82794e83677cab4f6c6b20620db31ac3dcc906a9ca6574b482dd45aa0 |
|
MD5 | e219ddf370c7e501e1272e43f3da3327 |
|
BLAKE2b-256 | 322ab1038ed8837f4bf63885e6b3607fb320555f27ad91f4c07201e166f64167 |
File details
Details for the file ddsketch-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: ddsketch-1.1.2-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 726868d34c9a9952013f8463c363e7e7f069a3ed598dea882911ee4538c8e9af |
|
MD5 | 4f97c0e99514c08c0bdcc589bf8ebbae |
|
BLAKE2b-256 | 9f4607322a0158bf7182e39020a30e8d80f1c17352979c8fee42dfe0ee9f7cd2 |