A Python package for common-nearest-neighbours clustering
Project description
Common-nearest-neighbours clustering
NOTE
This project is currently under development. The implementation may change in the future. Check the examples and the documentation for up-to-date information.
cnnclustering
The cnnclustering
Python package provides a flexible interface to use the common-nearest-neighbours cluster algorithm. While the method can be applied to arbitrary data, this implementation was made before the background of processing trajectories from Molecular Dynamics simulations. In this context the cluster result can serve as a suitable basis for the construction of a core-set Markov-state (cs-MSM) model to capture the essential dynamics of the underlying molecular processes. For a tool for cs-MSM estimation, refer to this separate project.
The package provides a main module:
cluster
: (Hierarchical) common-nearest-neighbours clustering and analysis
Features:
- Flexible: Clustering can be done for data sets in different input formats. Easy interfacing with external methods.
- Convenient: Integration of functionality, handy in the context of Molecular Dynamics.
- Fast: Core functionalities implemented in Cython.
Please refer to the following papers for the scientific background (and consider citing if you find the method useful):
- B. Keller, X. Daura, W. F. van Gunsteren J. Chem. Phys., 2010, 132, 074110.
- O. Lemke, B.G. Keller J. Chem. Phys., 2016, 145, 164104.
- O. Lemke, B.G. Keller Algorithms, 2018, 11, 19.
Documentation
The package documentation (under developement) is available here.
Install
Refer to the documentation for more details. Install from PyPi
$ pip install cnnclustering
or clone the development version and install from a local branch
$ git clone https://github.com/janjoswig/CommonNNClustering.git
$ cd CommonNNClustering
$ pip install .
Quickstart
>>> from cnnclustering.cluster import prepare_clustering
>>> # 2D data points (list of lists, 12 points in 2 dimensions)
>>> data_points = [ # point index
... [0, 0], # 0
... [1, 1], # 1
... [1, 0], # 2
... [0, -1], # 3
... [0.5, -0.5], # 4
... [2, 1.5], # 5
... [2.5, -0.5], # 6
... [4, 2], # 7
... [4.5, 2.5], # 8
... [5, -1], # 9
... [5.5, -0.5], # 10
... [5.5, -1.5], # 11
... ]
>>> clustering = prepare_clustering(data_points)
>>> clustering.fit(radius_cutoff=1.5, cnn_cutoff=1, v=False)
>>> clustering.labels
Labels([1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2])
Alternative scikit-learn implementation
We provide an alternative approach to common-nearest-neighbours clustering in the spirit of the scikit-learn project within scikit-learn-extra.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cnnclustering-0.4.1.tar.gz
.
File metadata
- Download URL: cnnclustering-0.4.1.tar.gz
- Upload date:
- Size: 9.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c21b2d3a9f75128567bc3e69a08fc1e14b8642aa7057a11c05f768bd01e6bd92 |
|
MD5 | 71ef1dbc0ff28efb376ab24aacd90d4c |
|
BLAKE2b-256 | 6687618344cfcdbe33745cd9188a80e8cf0402183471373c6cb5d1d234ffe9d6 |