A Python package for common-nearest-neighbours clustering
Project description
Common-nearest-neighbour clustering
The commonnn
Python package provides a flexible interface to use the common-nearest-neighbour (CommonNN) clustering procedure. While the method can be applied to arbitrary data, this implementation was made before the background of processing trajectories from Molecular Dynamics (MD) simulations. In this context the cluster result can serve as a suitable basis for the construction of a core-set Markov-state (cs-MSM) model to capture the essential dynamics of the underlying molecular processes.
The commonnn package
The package provides a main module:
cluster
: User interface to (hierarchical) CommoNN clustering
Further, it contains among others the modules:
plot
: Convenience functions to evaluate cluster results_types
: Direct access to generic types representing needed cluster components_fit
: Direct access to generic clustering procedures
Features:
- Flexible: The clustering can be done for data sets in different input formats. Internal parts of the procedure can be exchanged. Interfacing with external methods is made easy.
- Convenient: Integration of functionality, which may be handy in the context of MD data analysis.
- Fast: Core functionalities have been implemented in Cython.
Please refer to the following papers for the scientific background (and consider citing if you find the method useful):
- B. Keller, X. Daura, W. F. van Gunsteren J. Chem. Phys., 2010, 132, 074110.
- O. Lemke, B.G. Keller J. Chem. Phys., 2016, 145, 164104.
- O. Lemke, B.G. Keller Algorithms, 2018, 11, 19.
Documentation
The package documentation is available here online or under docs/index.html
.
The sources for the documentation can be found under docsrc/
and can be build using Sphinx.
Install
Refer to the documentation for more details. Install from PyPi
$ pip install commonnn-clustering
or clone the development version and install from a local branch
$ git clone https://github.com/bkellerlab/CommonNNClustering.git
$ cd CommonNNClustering
$ pip install .
Quickstart
>>> from commonnn import cluster
>>> # 2D data points (list of lists, 12 points in 2 dimensions)
>>> data_points = [ # point index
... [0, 0], # 0
... [1, 1], # 1
... [1, 0], # 2
... [0, -1], # 3
... [0.5, -0.5], # 4
... [2, 1.5], # 5
... [2.5, -0.5], # 6
... [4, 2], # 7
... [4.5, 2.5], # 8
... [5, -1], # 9
... [5.5, -0.5], # 10
... [5.5, -1.5], # 11
... ]
>>> clustering = cluster.Clustering(data_points)
>>> clustering.fit(radius_cutoff=1.5, similarity_cutoff=1, v=False)
>>> clustering.labels
array([1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2])
Alternative scikit-learn implementation
We provide an alternative approach to CommonNN clustering in the spirit of the scikit-learn project within scikit-learn-extra.
Development history
The present development repository has diverged with changes from the original one under github.com/janjoswig/CommonNNClustering.
A previous implementation of the clustering can be found under github.com/bettinakeller/CNNClustering.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file commonnn-clustering-0.0.3.tar.gz
.
File metadata
- Download URL: commonnn-clustering-0.0.3.tar.gz
- Upload date:
- Size: 47.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0607d837288973779fc978dfdf0e70e696b3e7180a993b1face27ba37ac67b82 |
|
MD5 | c3e7ce22f7ea7302430ace02e275b45a |
|
BLAKE2b-256 | f23df187a03f07f5469e11051f7683aa1adf2488ab0f51e9c620bde2998633fc |