Skip to main content

An E-Stream implementation in Python

Project description

PyPI Version License Travis CI Build Status

An E-Stream implementation in Python

E-Stream is an evolution-based technique for stream clustering which supports five behaviors:

  1. Appearance

  2. Disappearance

  3. Self-evolution

  4. Merge

  5. Split

These behaviors are achieved by representing each cluster as a Fading Cluster Structure with Histogram (FCH), utilizing a histogram for each feature of the data.

The details for the underlying concepts can be found here:

Udommanetanakit, K, Rakthanmanon, T, Waiyamai, K, E-Stream: Evolution-Based Technique for Stream Clustering, Advanced Data Mining and Applications: Third International Conference, 2007

How to use E-Stream

The estream package aims to be substibutable with sklearn classes so it can be used interchangably with other transformers with similar API.

from estream import EStream
from sklearn.datasets.samples_generator import make_blobs

estream = EStream()
data, _ = make_blobs()

estream.fit(data)

E-Stream contains a number of parameters that can be set; the major ones are as follows:

  • max_clusters: This limits the number of clusters the clustering can have before the existing clusters have to be merged. The default is set to 10.

  • stream_speed/decay_rate: These determine the fading factor of the clusters. In this implementation, the fading function is constant derived from the default values of 10 and 0.1, respectively.

  • remove_threshold: This sets the lower bound for each cluster’s weight before they are considered to be removed. The default is set to 0.1.

  • merge_threshold: This determines whether two close clusters can be merged togther. The default is set to 1.25.

  • radius_threshold: This determines the minimum range from an existing cluster that a new data must be in order to be merged into one. The default is set to 3.0.

  • active_threshold: This sets the minimum weight of each cluster before they are considered active. The default is set to 5.0.

An example of setting these parameters:

from estream import EStream
from sklearn.datasets.samples_generator import make_blobs

estream = EStream(max_clusters=5,
                  merge_threshold=0.5,
                  radius_threshold=1.5,
                  active_threshold=3.0)
data, _ = make_blobs()

estream.fit(data)

Installation

Currently, the package is only available through either PyPI:

pip install estream

or a manual install:

wget https://github.com/mickeycj/estream/archive/master.zip
unzip master.zip
rm master.zip
cd estream-master
python setup.py install

Help & Support

Currently, there is no dedicated documentation available, so any questions or issues can be asked via my email.

Citation

If you make use of this software for your work, please cite the paper from the Advanced Data Mining and Applications: Third International Conference:

@inproceedings{inproceedings,
    author = {Udommanetanakit, Komkrit, and Rakthanmanon, Thanawin and Waiyamai, Kitsana},
    year = {2007},
    month = {08},
    pages = {605-615},
    title = {E-Stream: Evolution-Based Technique for Stream Clustering},
    volume = {4632},
    doi = {10.1007/978-3-540-73871}
}

Moreover, this implementation is based on a MOA implementaion of E-Stream (and other related algorithms) by David Ratier. The original source code can be found in this repository.

License

The estream package is under the GNU General Public License.

Contributing

Contributions are always welcome! Everything ranging from code to notebooks and examples/documentation will be very valuable to the growth of this project. To contribute, please fork this project , make your changes and submit a pull request. I will do my best to fix any issues and merge your code into the main branch.

Author:

Chanon Jenakom

Version:

0.0.3

Dedicated:

To DAKDL, Kasetsart University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

estream-0.0.3.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

estream-0.0.3-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file estream-0.0.3.tar.gz.

File metadata

  • Download URL: estream-0.0.3.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for estream-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b94243d9130d87f35a4045c5b14ad3e2571d3930947c661a5721e0d7595b5cd9
MD5 396ea7c683bef226cbb6c27933b89830
BLAKE2b-256 3db437520abf13f564a7106659f2e579335c97122cf35bd799a7b2258c771669

See more details on using hashes here.

File details

Details for the file estream-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: estream-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for estream-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 46a43f1a240b235e78efa911f884880efa767f695b8857bcd379998d5f8e50d7
MD5 47c2b6198fa018fa77b67c1dd65c45b7
BLAKE2b-256 79f07e083694a471b9750d79a3b95d9f39ed12a1184f03d87861a2d70f034fa5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page