An E-Stream implementation in Python
Project description
An E-Stream implementation in Python
E-Stream is an evolution-based technique for stream clustering which supports five behaviors:
Appearance
Disappearance
Self-evolution
Merge
Split
These behaviors are achieved by representing each cluster as a Fading Cluster Structure with Histogram (FCH), utilizing a histogram for each feature of the data.
The details for the underlying concepts can be found here:
Udommanetanakit, K, Rakthanmanon, T, Waiyamai, K, E-Stream: Evolution-Based Technique for Stream Clustering, Advanced Data Mining and Applications: Third International Conference, 2007
How to use E-Stream
The estream package aims to be substibutable with sklearn classes so it can be used interchangably with other transformers with similar API.
from estream import EStream
from sklearn.datasets.samples_generator import make_blobs
estream = EStream()
data, _ = make_blobs()
estream.fit(data)
E-Stream contains a number of parameters that can be set; the major ones are as follows:
max_clusters: This limits the number of clusters the clustering can have before the existing clusters have to be merged. The default is set to 10.
stream_speed/decay_rate: These determine the fading factor of the clusters. In this implementation, the fading function is constant derived from the default values of 10 and 0.1, respectively.
remove_threshold: This sets the lower bound for each cluster’s weight before they are considered to be removed. The default is set to 0.1.
merge_threshold: This determines whether two close clusters can be merged togther. The default is set to 1.25.
radius_threshold: This determines the minimum range from an existing cluster that a new data must be in order to be merged into one. The default is set to 3.0.
active_threshold: This sets the minimum weight of each cluster before they are considered active. The default is set to 5.0.
An example of setting these parameters:
from estream import EStream
from sklearn.datasets.samples_generator import make_blobs
estream = EStream(max_clusters=5,
merge_threshold=0.5,
radius_threshold=1.5,
active_threshold=3.0)
data, _ = make_blobs()
estream.fit(data)
Installation
Currently, the package is only available through either PyPI:
pip install estream
or a manual install:
wget https://github.com/mickeycj/estream/archive/master.zip
unzip master.zip
rm master.zip
cd estream-master
python setup.py install
Help & Support
Currently, there is no dedicated documentation available, so any questions or issues can be asked via my email.
Citation
If you make use of this software for your work, please cite the paper from the Advanced Data Mining and Applications: Third International Conference:
@inproceedings{inproceedings,
author = {Udommanetanakit, Komkrit, and Rakthanmanon, Thanawin and Waiyamai, Kitsana},
year = {2007},
month = {08},
pages = {605-615},
title = {E-Stream: Evolution-Based Technique for Stream Clustering},
volume = {4632},
doi = {10.1007/978-3-540-73871}
}
Moreover, this implementation is based on a MOA implementaion of E-Stream (and other related algorithms) by David Ratier. The original source code can be found in this repository.
License
The estream package is under the GNU General Public License.
Contributing
Contributions are always welcome! Everything ranging from code to notebooks and examples/documentation will be very valuable to the growth of this project. To contribute, please fork this project , make your changes and submit a pull request. I will do my best to fix any issues and merge your code into the main branch.
- Author:
Chanon Jenakom
- Version:
0.0.3
- Dedicated:
To DAKDL, Kasetsart University
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file estream-0.0.3.tar.gz
.
File metadata
- Download URL: estream-0.0.3.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b94243d9130d87f35a4045c5b14ad3e2571d3930947c661a5721e0d7595b5cd9 |
|
MD5 | 396ea7c683bef226cbb6c27933b89830 |
|
BLAKE2b-256 | 3db437520abf13f564a7106659f2e579335c97122cf35bd799a7b2258c771669 |
File details
Details for the file estream-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: estream-0.0.3-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46a43f1a240b235e78efa911f884880efa767f695b8857bcd379998d5f8e50d7 |
|
MD5 | 47c2b6198fa018fa77b67c1dd65c45b7 |
|
BLAKE2b-256 | 79f07e083694a471b9750d79a3b95d9f39ed12a1184f03d87861a2d70f034fa5 |