Skip to main content

Distributed quantile sketches

Project description

ddsketch

This repo contains the Python implementation of the distributed quantile sketch algorithm DDSketch [1]. DDSketch has relative-error guarantees for any quantile q in [0, 1]. That is if the true value of the qth-quantile is x then DDSketch returns a value y such that |x-y| / x < e where e is the relative error parameter. (The default here is set to 0.01.) DDSketch is also fully mergeable, meaning that multiple sketches from distributed systems can be combined in a central node.

Our default implementation, DDSketch, is guaranteed [1] to not grow too large in size for any data that can be described by a distribution whose tails are sub-exponential.

We also provide implementations (LogCollapsingLowestDenseDDSketch and LogCollapsingHighestDenseDDSketch) where the q-quantile will be accurate up to the specified relative error for q that is not too small (or large). Concretely, the q-quantile will be accurate up to the specified relative error as long as it belongs to one of the m bins kept by the sketch. If the data is time in seconds, the default of m = 2048 covers 80 microseconds to 1 year.

Installation

To install this package, run pip install ddsketch, or clone the repo and run python setup.py install. This package depends on numpy and protobuf. (The protobuf dependency can be removed if it's not applicable.)

Usage

from ddsketch import DDSketch

sketch = DDSketch()

Add values to the sketch

import numpy as np

values = np.random.normal(size=500)
for v in values:
  sketch.add(v)

Find the quantiles of values to within the relative error.

quantiles = [sketch.get_quantile_value(q) for q in [0.5, 0.75, 0.9, 1]]

Merge another DDSketch into sketch.

another_sketch = DDSketch()
other_values = np.random.normal(size=500)
for v in other_values:
  another_sketch.add(v)
sketch.merge(another_sketch)

The quantiles of values concatenated with other_values are still accurate to within the relative error.

References

[1] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddsketch-1.1.2.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

ddsketch-1.1.2-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file ddsketch-1.1.2.tar.gz.

File metadata

  • Download URL: ddsketch-1.1.2.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6

File hashes

Hashes for ddsketch-1.1.2.tar.gz
Algorithm Hash digest
SHA256 3faa9cd82794e83677cab4f6c6b20620db31ac3dcc906a9ca6574b482dd45aa0
MD5 e219ddf370c7e501e1272e43f3da3327
BLAKE2b-256 322ab1038ed8837f4bf63885e6b3607fb320555f27ad91f4c07201e166f64167

See more details on using hashes here.

File details

Details for the file ddsketch-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: ddsketch-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6

File hashes

Hashes for ddsketch-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 726868d34c9a9952013f8463c363e7e7f069a3ed598dea882911ee4538c8e9af
MD5 4f97c0e99514c08c0bdcc589bf8ebbae
BLAKE2b-256 9f4607322a0158bf7182e39020a30e8d80f1c17352979c8fee42dfe0ee9f7cd2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page