An Implementation of Component-wise Peak Finding Clustering Method
Project description
CPFcluster
An implementation of the Component-wise Peak-Finding (CPF) clustering method, presented in 'Scalable and Adaptable Density-Based Clustering using Level Set and Mode-Seeking Methods'.
Dependencies
CPFcluster supports Python 3, with numpy, scipy, itertools, multiprocessing and scikit-learn. These should be linked with a BLAS implementation (e.g., OpenBLAS, ATLAS, Intel MKL).
Installation
CPFcluster is available on PyPI, the Python Package Index.
$ pip install CPFcluster
How To Use
To use CPFcluster, first import the CPFcluster module.
from CPFcluster import CPFcluster
Clustering a Dataset
A CPFcluster object is constructed using the fit method, which returns a clustering of a dataset.
CPF = CPFcluster(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
CPF.fit(X)
CPFcluster takes 6 arguments:
- k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
- rho (Defaults to 0.4) Parameter used in threshold for center selection.
- alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
- n_jobs (Defaults to 1) Number of cores for program to execute on.
- remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
- cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.
The CPFcluster object is then fit to a dataset:
- X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.
The result object further contains:
- CCmat An n-by-n sparse matrix representation of the k-NN graph.
- components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
- ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
- peaks A vector containing the index of the peaks selected as cluster centers.
- memberships The final cluster labelings.
CPFmatch for Multi-Image Matching
CPFmatch is the modified version of CPF applicable for the multi-image matching problem. To use CPFmatch, first import the CPFmatch module.
from CPFcluster import CPFmatch
Clustering a Dataset
A CPFmatch object is constructed using the fit method, which returns a clustering of a dataset.
match = CPFmatch(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
match.fit(X, img_label)
CPFmatch takes the same 6 arguments as CPFcluster:
- k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
- rho (Defaults to 0.4) Parameter used in threshold for center selection.
- alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
- n_jobs (Defaults to 1) Number of cores for program to execute on.
- remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
- cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.
The CPFmatch object is then fit to a dataset with the label of the images included also:
- X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.
- img_label An n-by-1 numpy.ndarray with the image label for each feature. The rows correspond to n keypoints, and no two keypoints from the same image will be clustered together.
The result object further contains as before:
- CCmat An n-by-n sparse matrix representation of the k-NN graph.
- components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
- ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
- peaks A vector containing the index of the peaks selected as cluster centers.
- memberships The final cluster labelings.
Tests
CPFcluster
CPFcluster has an MIT License.
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file CPFcluster-2.0.tar.gz
.
File metadata
- Download URL: CPFcluster-2.0.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ce152cc040cab8cd3da789ce530748c67dab32bd5d82b82f512a1b7b0351550 |
|
MD5 | 9463bf23765a2d91f075fb6278cfe1af |
|
BLAKE2b-256 | 6032590663ce4b0a91bc1d8ecd4893f1fe8cb7ae3d27ce99d58eb1d669fc3c26 |