A library to compute histograms on distributed environments, on streaming data
Project description
DistoGram is a library that allows to compute histogram on streaming data, in distributed environments. The implementation follows the algorithms described in Ben-Haim’s Streaming Parallel Decision Trees
Get Started
First create a compressed representation of a distribution:
import math
import random
import distogram
distribution = np.random.normal(size=10000)
#distribution = np.random.uniform(size=10000)
# Create and feed distogram from distribution
h = distogram.Distogram()
for i in distribution:
distogram.update(h, i)
Compute statistics on the distribution:
nmin, nmax = distogram.bounds(h)
print("count: {}".format(distogram.count(h)))
print("mean: {}".format(distogram.mean(h)))
print("stddev: {}".format(distogram.stddev(h)))
print("min: {}".format(nmin))
print("5%: {}".format(distogram.quantile(h, 0.05)))
print("25%: {}".format(distogram.quantile(h, 0.25)))
print("50%: {}".format(distogram.quantile(h, 0.50)))
print("75%: {}".format(distogram.quantile(h, 0.75)))
print("95%: {}".format(distogram.quantile(h, 0.95)))
print("max: {}".format(nmax))
count: 10000
mean: -0.005082954640481095
stddev: 1.0028524290149186
min: -3.5691130319855047
5%: -1.6597242392338374
25%: -0.6785107421744653
50%: -0.008672960012168916
75%: 0.6720718926935414
95%: 1.6476822301131866
max: 3.8800560034877427
Compute and display the histogram of the distribution:
hist = distogram.histogram(h)
df_hist = pd.DataFrame(np.array(hist), columns=["bin", "count"])
fig = px.bar(df_hist, x="bin", y="count", title="distogram")
fig.update_layout(height=300)
fig.show()
Install
DistoGram is available on PyPi and can be installed with pip:
pip install distogram
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
distogram-1.0.0.tar.gz
(5.2 kB
view details)
File details
Details for the file distogram-1.0.0.tar.gz
.
File metadata
- Download URL: distogram-1.0.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b0c8637e1bf8ed93d1eed8b463099144f8d77788cbd5ccde49add837fe61801 |
|
MD5 | 9938c77690dbea68f400f541d76d9705 |
|
BLAKE2b-256 | fdf3e6990869261b34f6cc254613f1b58d19a2cbb10a4ecf18e56891a5cbe61c |