Clustream, Streamkm++ and metrics utilities C/C++ bindings for python
Project description
ClusOpt Core
This package is used by ClusOpt for it's CPU intensive tasks, but it can be easily imported in any python data stream clustering project, it is coded mainly in C/C++ with bindings for python, and features:
- CluStream (based on MOA implementation)
- StreamKM++ (wrapped around the original paper authors implementation)
- Distance Matrix computation (in place implementation using boost threads)
- Silhouette score (custom in place implementation inspired by BIRCH clustering vector)
Prerequisites
- python >= 3.6
- pip
- boost-thread
- gcc >= 6
boost-thread
can be installed in Debian based systems with :
apt install libboost-thread-dev
Usage
See examples
folder for more.
CluStream online clustering
from clusopt_core.cluster import CluStream
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt
k = 32
dataset, _ = make_blobs(n_samples=64000, centers=k, random_state=42, cluster_std=0.1)
model = CluStream(
m=k * 10, # no microclusters
h=64000, # horizon
t=2, # radius factor
)
chunks = np.split(dataset, len(dataset) / 4000)
model.init_offline(chunks.pop(0), seed=42)
for chunk in chunks:
model.partial_fit(chunk)
clusters, _ = model.get_macro_clusters(k, seed=42)
plt.scatter(*dataset.T, marker=",", label="datapoints")
plt.scatter(*model.get_partial_cluster_centers().T, marker=".", label="microclusters")
plt.scatter(*clusters.T, marker="x", label="macro clusters", color="black")
plt.legend()
plt.show()
output:
Benchmarks
Some functions in clusopt_core are faster than scikit learn implementations, see the benchmark
folder for more info.
Silhouette
Each bar have a tuple of (no_samples,dimension,no_groups), so independently of those 3 factors, clusopt implementation is faster.
Distance Matrix
Each bar shows the dataset dimension, so clusopt_core implemetation is faster when the dataset dimension is small (<~150), even when using 4 processes in scikit-learn.
Installation
You can install it directly from pypi with
pip install clusopt-core
or you can clone this repo and install from the directory
pip install ./clusopt_core
Acknowledgments
Thanks to:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file clusopt_core-1.0.0.tar.gz
.
File metadata
- Download URL: clusopt_core-1.0.0.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0bb659024cb811206fc9c3e5612c21487322d41cfb2148379b771f4a4f942a9 |
|
MD5 | e22f03c8374bea1f9088fd67391fc023 |
|
BLAKE2b-256 | 9628baf60b8d73bea22cb211e12d024d42676be30594223a14fbd2e80d107700 |