Skip to main content

An E-Stream implementation in Python

Project description

PyPI Version License Travis CI Build Status

An E-Stream implementation in Python

E-Stream is an evolution-based technique for stream clustering which supports five behaviors:

  1. Appearance

  2. Disappearance

  3. Self-evolution

  4. Merge

  5. Split

These behaviors are achieved by representing each cluster as a Fading Cluster Structure with Histogram (FCH), utilizing a histogram for each feature of the data.

The details for the underlying concepts can be found here:

Udommanetanakit, K, Rakthanmanon, T, Waiyamai, K, E-Stream: Evolution-Based Technique for Stream Clustering, Advanced Data Mining and Applications: Third International Conference, 2007

How to use E-Stream

The estream package aims to be substibutable with sklearn classes so it can be used interchangably with other transformers with similar API.

from estream import EStream
from sklearn.datasets.samples_generator import make_blobs

estream = EStream()
data, _ = make_blobs()

estream.fit(data)

E-Stream contains a number of parameters that can be set; the major ones are as follows:

  • max_clusters: This limits the number of clusters the clustering can have before the existing clusters have to be merged. The default is set to 10.

  • stream_speed/decay_rate: These determine the fading factor of the clusters. In this implementation, the fading function is constant derived from the default values of 10 and 0.1, respectively.

  • remove_threshold: This sets the lower bound for each cluster’s weight before they are considered to be removed. The default is set to 0.1.

  • merge_threshold: This determines whether two close clusters can be merged togther. The default is set to 1.25.

  • radius_threshold: This determines the minimum range from an existing cluster that a new data must be in order to be merged into one. The default is set to 3.0.

  • active_threshold: This sets the minimum weight of each cluster before they are considered active. The default is set to 5.0.

An example of setting these parameters:

from estream import EStream
from sklearn.datasets.samples_generator import make_blobs

estream = EStream(max_clusters=5,
                  merge_threshold=0.5,
                  radius_threshold=1.5,
                  active_threshold=3.0)
data, _ = make_blobs()

estream.fit(data)

Installation

Currently, the package is only available through either PyPI:

pip install estream

or a manual install:

wget https://github.com/mickeycj/estream/archive/master.zip
unzip master.zip
rm master.zip
cd estream-master
python setup.py install

Help & Support

Currently, there is no dedicated documentation available, so any questions or issues can be asked via my email.

Citation

If you make use of this software for your work, please cite the paper from the Advanced Data Mining and Applications: Third International Conference:

@inproceedings{inproceedings,
    author = {Udommanetanakit, Komkrit, and Rakthanmanon, Thanawin and Waiyamai, Kitsana},
    year = {2007},
    month = {08},
    pages = {605-615},
    title = {E-Stream: Evolution-Based Technique for Stream Clustering},
    volume = {4632},
    doi = {10.1007/978-3-540-73871}
}

Moreover, this implementation is based on a MOA implementaion of E-Stream (and other related algorithms) by David Ratier. The original source code can be found in this repository.

License

The estream package is under the GNU General Public License.

Contributing

Contributions are always welcome! Everything ranging from code to notebooks and examples/documentation will be very valuable to the growth of this project. To contribute, please fork this project , make your changes and submit a pull request. I will do my best to fix any issues and merge your code into the main branch.

Author:

Chanon Jenakom

Version:

0.0.3

Dedicated:

To DAKDL, Kasetsart University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

estream-0.0.3.tar.gz (8.0 kB view hashes)

Uploaded Source

Built Distribution

estream-0.0.3-py3-none-any.whl (20.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page