Density-based clustering for exploratory data analysis based on multi-parameter persistence
Project description
Persistent and stable clustering (Persistable) is a density-based clustering algorithm intended for exploratory data analysis. What distinguishes Persistable from other clustering algorithms is its visualization capabilities. Persistable's interactive mode lets you visualize multi-scale and multi-density cluster structure present in the data. This is used to guide the choice of parameters that lead to the final clustering.
Usage
Here is a brief outline of the main functionality; see the documentation for details, including the API reference.
In order to run Persistable's interactive mode from a Jupyter notebook, run the following in a Jupyter cell:
import persistable
from sklearn.datasets import make_blobs
X = make_blobs(2000, centers=4, random_state=1)[0]
p = persistable.Persistable(X)
pi = persistable.PersistableInteractive(p)
pi.start_ui()
The last command returns the port in localhost
serving the UI, which is 8050
by default.
Now go to localhost:8050
in your web browser to access the graphical user interface:
After choosing your parameters using the user interface, you can get your clustering in another Jupyter cell by running:
clustering_labels = pi.cluster()
Note: You may use pi.start_ui(jupyter_mode="inline")
to have the graphical user interface display directly in the Jupyter notebook!
Installing
Make sure you are using Python 3.
Persistable depends on the following python packages, which will be installed automatically when you install with pip
:
numpy
, scipy
, scikit-learn
, cython
, plotly
, dash
, diskcache
, multiprocess
, psutil
.
To install from pypi, simply run the following:
pip install persistable-clustering
Documentation and support
You can find the documentation at persistable.readthedocs.io. If you have further questions, please open an issue and we will do our best to help you. Please include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use. If you do not wish to open an issue, you are also welcome to contact Luis Scoccola directly. Please be patient if it takes us a bit to get back to you.
Running the tests
You can run the tests by running the following commands from the root directory of a clone of this repository. If a test fails, please report a bug, trying to include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.
pip install pytest playwright pytest-playwright
python -m playwright install --with-deps
pip install -r requirements.txt
python -m setup build_ext --inplace
pytest .
Details about theory and implementation
Persistable is based on multi-parameter persistence [4], a method from topological data analysis. The theory behind Persistable is developed in [1], while this implementation uses the high performance algorithms for density-based clustering developed in [2] and implemented in [3]. Persistable's interactive mode is inspired by RIVET [5] and is implemented in Dash.
Contributing
To contribute, you can fork the project, make your changes, and submit a pull request. You may want to contact Luis Scoccola first, to make sure your work does not overlap with ongoing work.
Authors
Luis Scoccola and Alexander Rolle.
Citing
If you use this package in your work, you may cite the corresponding paper using the following bibtex entry:
@article{Scoccola2023,
doi = {10.21105/joss.05022},
url = {https://doi.org/10.21105/joss.05022},
year = {2023},
publisher = {The Open Journal},
volume = {8},
number = {83},
pages = {5022},
author = {Luis Scoccola and Alexander Rolle},
title = {Persistable: persistent and stable clustering},
journal = {Journal of Open Source Software}
}
References
[1] Stable and consistent density-based clustering. A. Rolle and L. Scoccola. arXiv:2005.09048
[2] Accelerated Hierarchical Density Based Clustering. L. McInnes, J. Healy. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017
[3] hdbscan: Hierarchical density based clustering. L. McInnes, J. Healy, S. Astels. Journal of Open Source Software, The Open Journal, volume 2, number 11. 2017
[4] An Introduction to Multiparameter Persistence. M. B. Botnan, M. Lesnick. Proceedings of the 2020 International Conference on Representations of Algebras. 2022
[5] RIVET. The RIVET Developers. [Git] [docs]
License
This software is published under the 3-clause BSD license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file persistable_clustering-0.5.3.tar.gz
.
File metadata
- Download URL: persistable_clustering-0.5.3.tar.gz
- Upload date:
- Size: 69.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13d923c9e9ec4a28d8fc1826f11637ef4b14ede3c62f5e2c9ca6fc8962e1a728 |
|
MD5 | 0fe3e278eea7d0d4692c56ff04994f1d |
|
BLAKE2b-256 | a6c5b35b1d994170b0b78932c0efd3addfac4fc7debc1c1261384deb755a6d4e |
File details
Details for the file persistable_clustering-0.5.3-cp312-cp312-win_amd64.whl
.
File metadata
- Download URL: persistable_clustering-0.5.3-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 582.1 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebd082a74863d0debbb4ca1874de0f42cd397061539e011dff4dd2ca90b38f24 |
|
MD5 | cf685fc7f1fc62fde19b0ff0774f4925 |
|
BLAKE2b-256 | 7e84693d66b1d64b78d53c7608b1c3ff15af495ec66d42fed3ccf7d60c2101ba |
File details
Details for the file persistable_clustering-0.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: persistable_clustering-0.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1ac5b8c7305a8b007506e5f654fed8cb2a3bb2431d3d48f741565ab9488f10a |
|
MD5 | dbb2e9cb44ef5f1e2fb843e7c318c370 |
|
BLAKE2b-256 | 62fc213b627c91530bf56ad70cae45958b992c0b0731cc1d796c4f4bb8758d4d |
File details
Details for the file persistable_clustering-0.5.3-cp312-cp312-macosx_10_9_universal2.whl
.
File metadata
- Download URL: persistable_clustering-0.5.3-cp312-cp312-macosx_10_9_universal2.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53a2e2781945007837d9dcada54e787c28118f418946672116d51a766e1176a3 |
|
MD5 | 26449290446e4e9fb7019993323cf4e2 |
|
BLAKE2b-256 | b12e142f480b10f4ae1f74eb6c7ecb957157d968a3a63d060b2c4aee967f0d14 |