Skip to main content

Stream clustering algorithms on modern hardware

Project description

Sesame

This project aims at building a scalable stream mining library on modern hardware.

  • The repo contains currently several representative real-world stream clustering algorithms and several synthetic algorithms.
  • We welcome your contributions, if you are interested to contribute to the project, please fork and submit a PR. If you have questions, feel free to log an issue.

Build Dependency

  • GCC-11 (In our paper, we use gcc-11.2.0)
  • Boost: 1.78.0 Link
  • GFLAGS: 2.2.0 Link

Real-world algorithms

Algorithm Window Model Outlier Detection Summarizing Data Structure Offline Refinement
BIRCH LandmarkWM OutlierD CFT
CluStream LandmarkWM OutlierD-T MCs
DenStream DampedWM OutlierD-BT MCs
DStream DampedWM OutlierD-T Grids
StreamKM++ LandmarkWM NoOutlierD CoreT
DBStream DampedWM OutlierD-T MCs
EDMStream DampedWM OutlierD-BT DPT
SL-KMeans SlidingWM NoOutlierD AMS

Synthetic algorithms

Algorithm Window Model Outlier Detection Summarizing Data Structure Offline Refinement
G1 LandmarkWM OutlierD MCs
G2 LandmarkWM OutlierD MCs
G3 LandmarkWM OutlierD CFT
G4 SlidingWM OutlierD MCs
G5 DampedWM OutlierD-B MCs
G6 LandmarkWM NoOutlierD MCs
G8 LandmarkWM OutlierD MCs
G9 LandmarkWM OutlierD Grids
G10 LandmarkWM OutlierD DPT
G11 LandmarkWM OutlierD-T MCs
G12 LandmarkWM OutlierD-B MCs
G13 LandmarkWM OutlierD-BT MCs
G14 LandmarkWM OutlierD AMS
G15 LandmarkWM OutlierD CoreT

Datasets

DataSet Length Dimension Cluster Number
CoverType 581012 54 7
KDD-99 4898431 41 23
Insects 905145 33 24
Sensor 2219803 5 55
EDS 45690, 100270, 150645, 200060, 245270 2 75, 145, 218, 289, 363
ODS 94720,97360,100000 2 90, 90, 90

You may download the datasets here: https://zenodo.org/records/8210331

How to Cite Sesame

  • [SIGMOD 2023] Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu. Data Stream Clustering: An In-depth Empirical Study, SIGMOD, 2023
@inproceedings{wang2023sesame,
	title        = {Data Stream Clustering: An In-depth Empirical Study},
	author       = {Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu},
	year         = 2023,
	booktitle    = {Proceedings of the 2023 International Conference on Management of Data (SIGMOD)},
	location     = {Seattle, WA, USA},
	publisher    = {Association for Computing Machinery},
	address      = {New York, NY, USA},
	series       = {SIGMOD '23},
	abbr         = {SIGMOD},
	bibtex_show  = {true},
	selected     = {true},
	pdf          = {papers/Sesame.pdf},
	code         = {https://github.com/intellistream/Sesame},
	doi	         = {10.1145/3589307},
        url          = {https://doi.org/10.1145/3589307}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysame-0.1.0.tar.gz (146.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pysame-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

pysame-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

pysame-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

pysame-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

File details

Details for the file pysame-0.1.0.tar.gz.

File metadata

  • Download URL: pysame-0.1.0.tar.gz
  • Upload date:
  • Size: 146.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for pysame-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97bf4935dd47ccb9e1fd4efa77064e4d84fce1ca5d95d62dd99ad507f31421b9
MD5 44ef2a5024e6e8c3e74e5c8a29faea4e
BLAKE2b-256 40ac1e80533c514c7ae3e95c099ba1997460beb5a260586859bd1b6f7d1e23dd

See more details on using hashes here.

File details

Details for the file pysame-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pysame-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 57e3bf9b567553c1b7fbe9c2716dc69e44ff38437c0ad39714c3b8091fce8e92
MD5 182176fb9fdcba105d298c7ed25a5b41
BLAKE2b-256 b6d829e1ac0724c6256ebe5c16f080b03ec3627e097db79dc92e5f8717a4bb59

See more details on using hashes here.

File details

Details for the file pysame-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pysame-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e762332ea5a0ae9d10ff80379eeb1ad953cec5bf94df1995d4d06c79c5608a5c
MD5 73cca8f56df529ca21cb2e7b520115e4
BLAKE2b-256 34188981524c33dfa45c86c3a960d486834b6494e0ae99671ebd615292ed21b9

See more details on using hashes here.

File details

Details for the file pysame-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pysame-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fcfad1d7934506a537fa87690bc0808ef5b5b60db14cabeb5b4427337b464bad
MD5 9e25c700f48fb3d8544f51b9f675aec1
BLAKE2b-256 cd55eaba171c70ddb2d07105163cd46ef7af4a7bb4f59e8e7e8daa1138ff85ed

See more details on using hashes here.

File details

Details for the file pysame-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pysame-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 39bdc03661a86349e17f3a131245791f43058a20ed200ffec950f000142fb15f
MD5 fc88d58dcf2c403b01218513a99b7c77
BLAKE2b-256 424fd8783730fd0ade0988266c152336da6f6de572737339c9a11883e555f68b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page