Skip to main content

Informative Dimensionality Reduction via Shannon Component Analysis

Project description

SCA (Shannon Components Analysis)

Shannon Components Analysis (formerly called scalpel) is a dimensionality reduction technique for single-cell data which leverages mathematical information theory to identify biologically informative axes of variation in single-cell transcriptomic data, enabling recovery of rare and common cell types at superior resolution. It is written in Python. The pre-print can be found here.

Required packages: scanpy, sklearn, numpy, scipy, fbpca, itertools

Installation

SCA is available via pip: pip install shannonca

Dependencies

SCA requires the following packages:

  • sklearn
  • scipy
  • numpy
  • matplotlib
  • pandas
  • seaborn
  • scanpy
  • fbpca

Usage

Dimensionality Reduction

SCA generates information score matrices, which are used to generate linear combinations of genes (metagenes) that are biologically informative. The package includes workflows both with and without Scanpy under sca.dimred.

Without Scanpy

The reduce function accepts a (num genes) x (num cells) matrix X, and outputs a dimensionality-reduced version with fewer features. The input matrix may be normalized or otherwise processed, but a zero in the input matrix must indicate zero recorded transcripts.

from shannonca.dimred import reduce

X = mmread('mydata.mtx').transpose() # read some dataset

reduction = reduce(X, n_comps=50, n_pcs=50, iters=1, nbhd_size=15, metric='euclidean', model='wilcoxon', chunk_size=1000, n_tests='auto')

reduction is an (num cells) x (n_comps)-dimensional matrix. The function optionally returns SCA's score matrix (if keep_scores=True), metagene loadings (if keep_loadings=True), or intermediate results (if iters>1 and keep_all_iters=True). If at least one of these is returned, the return type is a dictionary with keys for 'reduction', 'scores', and 'loadings'. If keep_all_iters=True, the reductions after each iteration will be keyed by 'reduction_i' for each iteration number i.

Starting neighborhoods are computed by default using Euclidean distance (controlled by metric) in n_comps-dimensional PCA space. See the docstring for more detailed and comprehensive parameter descriptions.

With Scanpy

Scanpy (https://github.com/theislab/scanpy) is a commonly-used single-cell workflow. To compute a reduction in place on a scanpy AnnData object, use reduce_scanpy:

import scanpy as sc
from shannonca.dimred import reduce_scanpy

adata = sc.AnnData(X)
reduce_scanpy(adata, keep_scores=True, keep_loadings=True, keep_all_iters=True, layer=None, key_added='sca', iters=1, n_comps=50)

This function shares all parameters with reduce, but instead of returning the reduction, it updates the input AnnData object. Dimensionality reductions are stored in adata.obsm[key_added], or, if keep_all_iters=True, in adata.obsm['key_added_i'] for each iteration number i. If keep_scores=True in the reducer constructor, the information scores of each gene in each cell are stored in adata.layers[key_added_score]. If layer=None, the algorithm is run on adata.X; otherwise, it is run on adata.layers[layer].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shannonca-0.0.6.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shannonca-0.0.6-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file shannonca-0.0.6.tar.gz.

File metadata

  • Download URL: shannonca-0.0.6.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.7

File hashes

Hashes for shannonca-0.0.6.tar.gz
Algorithm Hash digest
SHA256 22f7d178b9f8aaa3125e75b27d173c9e610850868d53fae36685861e82fda0c9
MD5 f84d92bf3f5c35b12db50a7554df0b2c
BLAKE2b-256 855ef6c715b4dffd40e88238907a93abc791047c3bebd73559af26942c244e44

See more details on using hashes here.

File details

Details for the file shannonca-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: shannonca-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.7

File hashes

Hashes for shannonca-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 52a40d72b52c0f48db889b699c9dfb7dd5ff414bad8a1081f91e5d08e41cc6fa
MD5 633cf0f3af6a99e894e25addfbd3d74a
BLAKE2b-256 3bffad0ed9fc383e840d5446362465169f292d03f472efe43476c860c22a9d46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page