Skip to main content

A slim and fast implementation of the RACCOON clustering library.

Project description

# nimble raccoon

A slim and fast reimplementation of [RACCOON](https://github.com/shlienlab/raccoon) (Resolution-Adaptive Coarse-to-fine Clusters OptimizatiON), iterative clustering library for python 3.

<!– <img src=”docs/figs/sps_logo.png” width=400, padding=100> [![DOI](https://zenodo.org/badge/91130860.svg)](https://zenodo.org/badge/latestdoi/91130860) [![PyPI version](https://badge.fury.io/py/simpsom.svg)](https://badge.fury.io/py/simpsom) [![Documentation Status](https://readthedocs.org/projects/simpsom/badge/?version=latest)](https://simpsom.readthedocs.io/en/latest/?badge=latest) ![example workflow](https://github.com/fcomitani/simpsom/actions/workflows/pytest.yml/badge.svg) [![codecov](https://codecov.io/gh/fcomitani/simpsom/branch/main/graph/badge.svg?token=2OHOCO0O4I)](https://codecov.io/gh/fcomitani/simpsom) –>

It relies on [faiss](https://github.com/facebookresearch/faiss) for quick nearest neighbors searches and better memory management, but offers fewer functionalities, only allowing cosine and euclidean distances, Louvain as its clustering algorithm, and grid search.

## Installation

nimble-raccoon can be installed with pip with the following command

pip install nimble-raccoon

<!– For the GPU-enable version, clone this repo and install add the -gpu flag –>

## Usage

To identify clusters and build the hierarchy initialize a Raccoon object with set parameters and then call it on the input_data object, a pandas dataframe.

import numpy as np from functools import partial

from nimbloon import Raccoon

You can define custom functions to dynamically adapt the search range to the size of the dataset.

def half_sqrt_range(x, num_elements=5):

sq = np.sqrt(x) return np.linspace(sq/2, sq, num_elements, dtype=int)

rc_args = {‘metric’: ‘cosine’,

‘scale’: False, ‘cumulative_variance’: [.75, .8, .9, .95, .99], ‘clustering_parameter’: np.logspace(-2, 1.5, 10), ‘n_neighbors’: partial(half_sqrt_range, num_elements=3), ‘target_dimensions’: 12, ‘min_cluster_size’: 25, ‘max_neighbors’: 100, ‘silhouette_threshold’: 0., ‘max_depth’: 5}

Once everything is set up you can instantiate the Raccoon object.

rc = Raccoon(outh_path=’./rc_output’, **rc_args)

And then call it on the input dataset to build the clusters hierarchy.

rc_labels, rc_tree = rc(input_table)

Available parameters are:

  • metric: the distance metric, currently only cosine and euclidean can be set as metrics.

  • scale: scale features at every iteration.

  • cumulative_variance: the limit of cumulative variance for low-information

    features removal with tSVD. Can be a single float or a list of float

  • clustering_parameter: the range of resolutions for the Louvain clustering

    algorithm. Can be a single float or a list of float

  • n_neighbors: the number of nearest neighbors to use across the search.

    Can be a single int or a list of int. Can also be a function, in which case this will be applied to the input set size, adapting this parameter at each iteration.

  • target_dimensions: the dimensionality of the target space after

    applying UMAP.

  • min_cluster_size: minimum size of clusters, if a cluster is identified

    with size below this threshold, it will not be further split.

  • max_neighbors: maximum number of neighbors, useful when n_neighbors is

    dynamic and population dependent avoiding excessively costly operations for large datasets.

  • silhouette_threshold: minimum silhouette score value to reach for a

    partition to be accepted.

  • max_depth: maximum depth of the clusters hierarchy.

The output will be a one-hot-encoded dataframe with samples as rows and cluster labels as columns, and an anytree object with information on the hierarchical relationship among clusters. This information will also be automatically saved to disk in the out_path folder.

Both tSVD and UMAP steps can be skipped, by setting cumulative_variance to 1 and/or target_dimensions to None respectively.

## Citation

If you are using this library for your work, please cite the original RACCOON publication.

> Comitani, F., Nash, J.O., Cohen-Gogo, S. et al. Diagnostic classification of childhood cancer using multiscale transcriptomics. Nat Med 29, 656–666 (2023). https://doi.org/10.1038/s41591-023-02221-x

## Contributions

Contributions are always welcome. If you would like to help us improve this library please fork the main branch and make sure pytest pass after your changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nimble-raccoon-0.1.1.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nimble_raccoon-0.1.1-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file nimble-raccoon-0.1.1.tar.gz.

File metadata

  • Download URL: nimble-raccoon-0.1.1.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for nimble-raccoon-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fdfff58f33cb899c1402966224dc9b2f9c1887b93faa64bd395ea9c432884f25
MD5 bd9090ad311b32c4af79d0be5ec90482
BLAKE2b-256 129315916710b02fdfbb6a41584c6360b677231676e8de8683dcdf7fcd3245d5

See more details on using hashes here.

File details

Details for the file nimble_raccoon-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: nimble_raccoon-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for nimble_raccoon-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 056604f5f872f832c7198e04ddbb235e35d190f29766e04ce7fa56db7c0b6d43
MD5 5df7efaffc697c47903b656a2be90435
BLAKE2b-256 682a34e2b897f7339d044b7f3a72e07492950a5ee76437180b0560125cda4427

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page