Skip to main content

python implementation of ARBOL scRNAseq iterative tiered clustering\nhttps://github.com/jo-m-lab/ARBOL

Project description

ARBOLpy

python implementation of the R package ARBOL, scRNAseq iterative tiered clustering

Iteratively cluster single cell datasets using a scanpy anndata object as input. Identifies and uses optimum clustering parameters at each tier of clustering. Current build includes SCtransform normalization. Outputs QC and visualization plots for each clustering event.

Install

By github:

pip install git+https://github.com/jo-m-lab/ARBOLpy.git

from PyPI

pip install arbolpy

import ARBOL

or clone the repository and source the functions directly from the script

git clone https://github.com/jo-m-lab/ARBOLpy.git

import "path/to/cloned/git/repo/ARBOLpy/ARBOL"

there is a docker image available with ARBOL and dependencies preinstalled https://hub.docker.com/r/kkimler/arbolpy

Recommended Usage

ARBOL was developed and used in the paper, "A treatment-naïve cellular atlas of pediatric Crohn’s disease predicts disease severity and therapeutic response" Currently, a tutorial is only available for the R version, where the FGID atlas figure is reproduced: https://jo-m-lab.github.io/ARBOL/ARBOLtutorial.html

ARBOLpy is a stripped down version of ARBOL meant to perform iterative clustering with little overhead. Currently it does not include the two stop conditions that the R version uses to heuristically join similar clusters. This results in the Python version overclustering data. Methods for merging the end clusters of the tree are available on the develop branch of the R version of ARBOL.

This package is meant as a starting point for the way that we approached clustering and and is meant to be edited/customized through community feedback through users such as yourself!

The main function of ARBOLpy is ARBOL() - here is an example call.

import scanpy as sc
import ARBOL

adata = sc.datasets.pbmc3k()

tree = ARBOL.ARBOL(adata)

ARBOL.write_ARBOL_output(tree,output_csv='endclusts.csv')

The helper function write_ARBOL_output writes the anytree object's endclusters to a csv file.

Note This script can take a long time to run. Running on 20K cells could take >30 minutes. Running on 100k+ cells could take >3 hours.

Note It has been tested up to 200k cells, and beyond 10k cells, maintains a linear relationship between resource usage and number of cells

Python ARBOL resource usage:
Pearson residuals normalization:
- 1.2 GB RAM per 1000 cells
- 2 minutes per 1000 cells
TPM normalization:
- 1.2 GB RAM per 1000 cells
- 1:55 min per 1000 cells

R ARBOL resource usage:
Pearson residuals normalization (SCTransform):
- 1.2 GB RAM per 1000 cells
- 4 minutes per 1000 cells

The current RAM/time bottleneck is the silhouette analysis, which runs 30 rounds of clustering at different resolutions.

ARBOL() Parameters

  • adata scanpy anndata object
  • normalize_method normalization method, defaults to "Pearson", scanpy's experimental implementation of SCTransform. Also available: "TPM": as implemented in scanpy normalize_total()
  • tier starting tier, defaults to 0
  • cluster starting cluster, defaults to 0
  • min_cluster_size minimum number of cells to allow further clustering
  • tree anytree object to attach arbol to. Shouldn't be changed unless building onto a pre-existing tree.
  • parent parent node of current clustering event, defaults to None. As with tree, shouldn't be changed unless building onto a pre-existing anytree object
  • max_tiers maximum number of tiers to allow further clustering
  • min_silhouette_res lower bound of silhouette analysis leiden clustering resolution parameter scan
  • max_silhouette_res upper bound
  • silhouette_subsampling_n number of cells to subsample anndata for silhouette analysis (cluster resolution choice)
  • h5dir where to save h5 objects for each tier and cluster, if None, does not save
  • figdir where to save QC and viz figures for each tier and cluster, if None does not save

Returns

  • anytree object based on iterative tiered clustering

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ARBOLpy-0.0.8.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

ARBOLpy-0.0.8-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file ARBOLpy-0.0.8.tar.gz.

File metadata

  • Download URL: ARBOLpy-0.0.8.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.4

File hashes

Hashes for ARBOLpy-0.0.8.tar.gz
Algorithm Hash digest
SHA256 59475fdc6a3b6f056a0e6984e265d10033727c907258243d5cbe2556664346b0
MD5 86558ac8c73dcb0ef13a043a844f6657
BLAKE2b-256 841bc869d4782137222dc124109f33633705b7de546d3bb230fd21e3330e7cc1

See more details on using hashes here.

File details

Details for the file ARBOLpy-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: ARBOLpy-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.4

File hashes

Hashes for ARBOLpy-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e609285081ed3d99b2fe7204eba09014e92b9c3b39930367959de652dbbfcd06
MD5 fe50742512f5996b9a62be9761dcbde5
BLAKE2b-256 e4ec3beed59ba6dfd10b7c65ff1f6d938a44aca43ab596075ef9dc7e22e89ba7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page