Skip to main content

Stream clustering algorithms on modern hardware

Project description

Sesame

This project aims at building a scalable stream mining library on modern hardware.

  • The repo contains currently several representative real-world stream clustering algorithms and several synthetic algorithms.
  • We welcome your contributions, if you are interested to contribute to the project, please fork and submit a PR. If you have questions, feel free to log an issue.

Build Dependency

  • GCC-11 (In our paper, we use gcc-11.2.0)
  • Boost: 1.78.0 Link
  • GFLAGS: 2.2.0 Link

Real-world algorithms

Algorithm Window Model Outlier Detection Summarizing Data Structure Offline Refinement
BIRCH LandmarkWM OutlierD CFT
CluStream LandmarkWM OutlierD-T MCs
DenStream DampedWM OutlierD-BT MCs
DStream DampedWM OutlierD-T Grids
StreamKM++ LandmarkWM NoOutlierD CoreT
DBStream DampedWM OutlierD-T MCs
EDMStream DampedWM OutlierD-BT DPT
SL-KMeans SlidingWM NoOutlierD AMS

Synthetic algorithms

Algorithm Window Model Outlier Detection Summarizing Data Structure Offline Refinement
G1 LandmarkWM OutlierD MCs
G2 LandmarkWM OutlierD MCs
G3 LandmarkWM OutlierD CFT
G4 SlidingWM OutlierD MCs
G5 DampedWM OutlierD-B MCs
G6 LandmarkWM NoOutlierD MCs
G8 LandmarkWM OutlierD MCs
G9 LandmarkWM OutlierD Grids
G10 LandmarkWM OutlierD DPT
G11 LandmarkWM OutlierD-T MCs
G12 LandmarkWM OutlierD-B MCs
G13 LandmarkWM OutlierD-BT MCs
G14 LandmarkWM OutlierD AMS
G15 LandmarkWM OutlierD CoreT

Datasets

DataSet Length Dimension Cluster Number
CoverType 581012 54 7
KDD-99 4898431 41 23
Insects 905145 33 24
Sensor 2219803 5 55
EDS 45690, 100270, 150645, 200060, 245270 2 75, 145, 218, 289, 363
ODS 94720,97360,100000 2 90, 90, 90

You may download the datasets here: https://zenodo.org/records/8210331

How to Cite Sesame

  • [SIGMOD 2023] Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu. Data Stream Clustering: An In-depth Empirical Study, SIGMOD, 2023
@inproceedings{wang2023sesame,
	title        = {Data Stream Clustering: An In-depth Empirical Study},
	author       = {Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu},
	year         = 2023,
	booktitle    = {Proceedings of the 2023 International Conference on Management of Data (SIGMOD)},
	location     = {Seattle, WA, USA},
	publisher    = {Association for Computing Machinery},
	address      = {New York, NY, USA},
	series       = {SIGMOD '23},
	abbr         = {SIGMOD},
	bibtex_show  = {true},
	selected     = {true},
	pdf          = {papers/Sesame.pdf},
	code         = {https://github.com/intellistream/Sesame},
	doi	         = {10.1145/3589307},
        url          = {https://doi.org/10.1145/3589307}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysame-0.1.0.tar.gz (146.9 kB view hashes)

Uploaded Source

Built Distributions

pysame-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl (7.4 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

pysame-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (7.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

pysame-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (7.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

pysame-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl (7.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page