An Implementation of Component-wise Peak Finding Clustering Method

## Project description

# CPFcluster

An implementation of the Component-wise Peak-Finding (CPF) clustering method, presented in 'Scalable and Adaptable Density-Based Clustering using Level Set and Mode-Seeking Methods'.

## Dependencies

*CPFcluster* supports Python 3, with numpy, scipy, itertools, multiprocessing and scikit-learn. These should be linked with a BLAS implementation
(e.g., OpenBLAS, ATLAS, Intel MKL).

## Installation

CPFcluster is available on PyPI, the Python Package Index.

```
$ pip install CPFcluster
```

## How To Use

To use CPFcluster, first import the *CPFcluster* module.

```
from CPFcluster import CPFcluster
```

### Clustering a Dataset

A CPFcluster object is constructed using the *fit* method, which returns a clustering of a dataset.

```
CPF = CPFcluster(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
CPF.fit(X)
```

CPFcluster takes 6 arguments:

**k**Number of nearest-neighbors used to create connected components from the dataset and compute the density.**rho**(Defaults to 0.4) Parameter used in threshold for center selection.**alpha**(Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.**n_jobs**(Defaults to 1) Number of cores for program to execute on.**remove_duplicates**(Defaults to False) Option to remove duplicate rows from data in advance of clustering.**cutoff**(Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFcluster object is then fit to a dataset:

**X**An*n-by-d*numpy.ndarray with training data. The rows correspond to*n*observations, and the columns correspond to*d*dimensions.

The result object further contains:

**CCmat**An*n-by-n*sparse matrix representation of the*k*-NN graph.**components**A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.**ps**A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.**peaks**A vector containing the index of the peaks selected as cluster centers.**memberships**The final cluster labelings.

## CPFmatch for Multi-Image Matching

CPFmatch is the modified version of CPF applicable for the multi-image matching problem. To use CPFmatch, first import the *CPFmatch* module.

```
from CPFcluster import CPFmatch
```

### Clustering a Dataset

A CPFmatch object is constructed using the *fit* method, which returns a clustering of a dataset.

```
match = CPFmatch(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
match.fit(X, img_label)
```

CPFmatch takes the same 6 arguments as CPFcluster:

**k**Number of nearest-neighbors used to create connected components from the dataset and compute the density.**rho**(Defaults to 0.4) Parameter used in threshold for center selection.**alpha**(Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.**n_jobs**(Defaults to 1) Number of cores for program to execute on.**remove_duplicates**(Defaults to False) Option to remove duplicate rows from data in advance of clustering.**cutoff**(Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFmatch object is then fit to a dataset with the label of the images included also:

**X**An*n-by-d*numpy.ndarray with training data. The rows correspond to*n*observations, and the columns correspond to*d*dimensions.**img_label**An*n-by-1*numpy.ndarray with the image label for each feature. The rows correspond to*n*keypoints, and no two keypoints from the same image will be clustered together.

The result object further contains as before:

**CCmat**An*n-by-n*sparse matrix representation of the*k*-NN graph.**components**A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.**ps**A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.**peaks**A vector containing the index of the peaks selected as cluster centers.**memberships**The final cluster labelings.

## Tests

## CPFcluster

*CPFcluster* has an MIT License.

See LICENSE.

## Project details

## Release history Release notifications | RSS feed

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.