Skip to main content

A library to compute histograms on distributed environments, on streaming data

Project description

https://badge.fury.io/py/distogram.svg Github WorkFlows Documentation Status

DistoGram is a library that allows to compute histogram on streaming data, in distributed environments. The implementation follows the algorithms described in Ben-Haim’s Streaming Parallel Decision Trees

Get Started

First create a compressed representation of a distribution:

import math
import random
import distogram

distribution = np.random.normal(size=10000)
#distribution = np.random.uniform(size=10000)

# Create and feed distogram from distribution
h = distogram.Distogram()
for i in distribution:
    distogram.update(h, i)

Compute statistics on the distribution:

nmin, nmax = distogram.bounds(h)
print("count: {}".format(distogram.count(h)))
print("mean: {}".format(distogram.mean(h)))
print("stddev: {}".format(distogram.stddev(h)))
print("min: {}".format(nmin))
print("5%: {}".format(distogram.quantile(h, 0.05)))
print("25%: {}".format(distogram.quantile(h, 0.25)))
print("50%: {}".format(distogram.quantile(h, 0.50)))
print("75%: {}".format(distogram.quantile(h, 0.75)))
print("95%: {}".format(distogram.quantile(h, 0.95)))
print("max: {}".format(nmax))
count: 10000
mean: -0.005082954640481095
stddev: 1.0028524290149186
min: -3.5691130319855047
5%: -1.6597242392338374
25%: -0.6785107421744653
50%: -0.008672960012168916
75%: 0.6720718926935414
95%: 1.6476822301131866
max: 3.8800560034877427

Compute and display the histogram of the distribution:

hist = distogram.histogram(h)
df_hist = pd.DataFrame(np.array(hist), columns=["bin", "count"])
fig = px.bar(df_hist, x="bin", y="count", title="distogram")
fig.update_layout(height=300)
fig.show()
docs/normal_histogram.png

Install

DistoGram is available on PyPi and can be installed with pip:

pip install distogram

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distogram-1.0.0.tar.gz (5.2 kB view details)

Uploaded Source

File details

Details for the file distogram-1.0.0.tar.gz.

File metadata

  • Download URL: distogram-1.0.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for distogram-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3b0c8637e1bf8ed93d1eed8b463099144f8d77788cbd5ccde49add837fe61801
MD5 9938c77690dbea68f400f541d76d9705
BLAKE2b-256 fdf3e6990869261b34f6cc254613f1b58d19a2cbb10a4ecf18e56891a5cbe61c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page