Stream clustering algorithms on modern hardware
Project description
Sesame
This project aims at building a scalable stream mining library on modern hardware.
- The repo contains currently several representative real-world stream clustering algorithms and several synthetic algorithms.
- We welcome your contributions, if you are interested to contribute to the project, please fork and submit a PR. If you have questions, feel free to log an issue.
Build Dependency
Real-world algorithms
| Algorithm | Window Model | Outlier Detection | Summarizing Data Structure | Offline Refinement |
|---|---|---|---|---|
| BIRCH | LandmarkWM | OutlierD | CFT | ❌ |
| CluStream | LandmarkWM | OutlierD-T | MCs | ✅ |
| DenStream | DampedWM | OutlierD-BT | MCs | ✅ |
| DStream | DampedWM | OutlierD-T | Grids | ❌ |
| StreamKM++ | LandmarkWM | NoOutlierD | CoreT | ✅ |
| DBStream | DampedWM | OutlierD-T | MCs | ✅ |
| EDMStream | DampedWM | OutlierD-BT | DPT | ❌ |
| SL-KMeans | SlidingWM | NoOutlierD | AMS | ❌ |
Synthetic algorithms
| Algorithm | Window Model | Outlier Detection | Summarizing Data Structure | Offline Refinement |
|---|---|---|---|---|
| G1 | LandmarkWM | OutlierD | MCs | ✅ |
| G2 | LandmarkWM | OutlierD | MCs | ✅ |
| G3 | LandmarkWM | OutlierD | CFT | ❌ |
| G4 | SlidingWM | OutlierD | MCs | ❌ |
| G5 | DampedWM | OutlierD-B | MCs | ❌ |
| G6 | LandmarkWM | NoOutlierD | MCs | ❌ |
| G8 | LandmarkWM | OutlierD | MCs | ❌ |
| G9 | LandmarkWM | OutlierD | Grids | ❌ |
| G10 | LandmarkWM | OutlierD | DPT | ❌ |
| G11 | LandmarkWM | OutlierD-T | MCs | ❌ |
| G12 | LandmarkWM | OutlierD-B | MCs | ❌ |
| G13 | LandmarkWM | OutlierD-BT | MCs | ❌ |
| G14 | LandmarkWM | OutlierD | AMS | ❌ |
| G15 | LandmarkWM | OutlierD | CoreT | ❌ |
Datasets
| DataSet | Length | Dimension | Cluster Number |
|---|---|---|---|
| CoverType | 581012 | 54 | 7 |
| KDD-99 | 4898431 | 41 | 23 |
| Insects | 905145 | 33 | 24 |
| Sensor | 2219803 | 5 | 55 |
| EDS | 45690, 100270, 150645, 200060, 245270 | 2 | 75, 145, 218, 289, 363 |
| ODS | 94720,97360,100000 | 2 | 90, 90, 90 |
You may download the datasets here: https://zenodo.org/records/8210331
How to Cite Sesame
- [SIGMOD 2023] Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu. Data Stream Clustering: An In-depth Empirical Study, SIGMOD, 2023
@inproceedings{wang2023sesame,
title = {Data Stream Clustering: An In-depth Empirical Study},
author = {Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu},
year = 2023,
booktitle = {Proceedings of the 2023 International Conference on Management of Data (SIGMOD)},
location = {Seattle, WA, USA},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
series = {SIGMOD '23},
abbr = {SIGMOD},
bibtex_show = {true},
selected = {true},
pdf = {papers/Sesame.pdf},
code = {https://github.com/intellistream/Sesame},
doi = {10.1145/3589307},
url = {https://doi.org/10.1145/3589307}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysame-0.1.0.tar.gz.
File metadata
- Download URL: pysame-0.1.0.tar.gz
- Upload date:
- Size: 146.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97bf4935dd47ccb9e1fd4efa77064e4d84fce1ca5d95d62dd99ad507f31421b9
|
|
| MD5 |
44ef2a5024e6e8c3e74e5c8a29faea4e
|
|
| BLAKE2b-256 |
40ac1e80533c514c7ae3e95c099ba1997460beb5a260586859bd1b6f7d1e23dd
|
File details
Details for the file pysame-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pysame-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57e3bf9b567553c1b7fbe9c2716dc69e44ff38437c0ad39714c3b8091fce8e92
|
|
| MD5 |
182176fb9fdcba105d298c7ed25a5b41
|
|
| BLAKE2b-256 |
b6d829e1ac0724c6256ebe5c16f080b03ec3627e097db79dc92e5f8717a4bb59
|
File details
Details for the file pysame-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pysame-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e762332ea5a0ae9d10ff80379eeb1ad953cec5bf94df1995d4d06c79c5608a5c
|
|
| MD5 |
73cca8f56df529ca21cb2e7b520115e4
|
|
| BLAKE2b-256 |
34188981524c33dfa45c86c3a960d486834b6494e0ae99671ebd615292ed21b9
|
File details
Details for the file pysame-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pysame-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcfad1d7934506a537fa87690bc0808ef5b5b60db14cabeb5b4427337b464bad
|
|
| MD5 |
9e25c700f48fb3d8544f51b9f675aec1
|
|
| BLAKE2b-256 |
cd55eaba171c70ddb2d07105163cd46ef7af4a7bb4f59e8e7e8daa1138ff85ed
|
File details
Details for the file pysame-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pysame-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39bdc03661a86349e17f3a131245791f43058a20ed200ffec950f000142fb15f
|
|
| MD5 |
fc88d58dcf2c403b01218513a99b7c77
|
|
| BLAKE2b-256 |
424fd8783730fd0ade0988266c152336da6f6de572737339c9a11883e555f68b
|