Skip to main content

T-Digest data structure

Project description

# tdigest
### Efficient percentile estimation of streaming or distributed data

This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).

See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)


### Usage

```
from tdigest import TDigest
from numpy.random import random

T1 = TDigest()
for _ in range(5000):
T1.update(random())

print T1.percentile(0.15) # about 0.15


T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)

T = T1 + T2
T.percentile(0.3) # about 0.3
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tdigest, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size tdigest-0.1.0.tar.gz (3.2 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page