Informative Dimensionality Reduction via Shannon Component Analysis

These details have not been verified by PyPI

Project links

Project description

SCA (Surprisal Component Analysis)

Surprisal Component Analysis (SCA) is a dimensionality reduction technique for single-cell data which leverages mathematical information theory to identify biologically informative axes of variation in single-cell transcriptomic data, enabling recovery of rare and common cell types at superior resolution. It is written in Python. The pre-print can be found here.

For full documentation of shannonca's API, see our Readthedocs page.

Installation

SCA is available via pip: pip install shannonca

Dependencies

SCA requires the following packages:

scikit-learn
scipy
numpy
matplotlib
pandas
seaborn
scanpy
fbpca

Usage

Dimensionality Reduction

SCA generates information score matrices, which are used to generate linear combinations of genes (metagenes) that are biologically informative. The package includes workflows both with and without Scanpy under sca.dimred.

Without Scanpy

The reduce function accepts a (num genes) x (num cells) matrix X, and outputs a dimensionality-reduced version with fewer features. The input matrix may be normalized or otherwise processed, but a zero in the input matrix must indicate zero recorded transcripts.

from shannonca.dimred import reduce

X = mmread('mydata.mtx').transpose() # read some dataset

reduction = reduce(X, n_comps=50, n_pcs=50, iters=1, nbhd_size=15, metric='euclidean', model='wilcoxon', chunk_size=1000, n_tests='auto')

reduction is an (num cells) x (n_comps)-dimensional matrix. The function optionally returns SCA's score matrix (if keep_scores=True), metagene loadings (if keep_loadings=True), or intermediate results (if iters>1 and keep_all_iters=True). If at least one of these is returned, the return type is a dictionary with keys for 'reduction', 'scores', and 'loadings'. If keep_all_iters=True, the reductions after each iteration will be keyed by 'reduction_i' for each iteration number i.

Starting neighborhoods are computed by default using Euclidean distance (controlled by metric) in n_comps-dimensional PCA space. See the docstring for more detailed and comprehensive parameter descriptions.

With Scanpy

Scanpy (https://github.com/theislab/scanpy) is a commonly-used single-cell workflow. To compute a reduction in place on a scanpy AnnData object, use reduce_scanpy:

import scanpy as sc
from shannonca.dimred import reduce_scanpy

adata = sc.AnnData(X)
reduce_scanpy(adata, keep_scores=True, keep_loadings=True, keep_all_iters=True, layer=None, key_added='sca', iters=1, n_comps=50)

This function shares all parameters with reduce, but instead of returning the reduction, it updates the input AnnData object. Dimensionality reductions are stored in adata.obsm[key_added], or, if keep_all_iters=True, in adata.obsm['key_added_i'] for each iteration number i. If keep_scores=True in the reducer constructor, the information scores of each gene in each cell are stored in adata.layers[key_added_score]. If layer=None, the algorithm is run on adata.X; otherwise, it is run on adata.layers[layer].

Troubleshooting

If you are having trouble running SCA, try the following:

Pull from the github repository to ensure that your version of SCA is up to date.
Ensure that the Python version is at least 3.0, and that the installations of scanpy, numpy, scipy, and sklearn are up to date.
When running the reduce function, ensure that the input is either a CSR sparse matrix (scipy.sparse.csr_matrix) or a dense numpy array, with one row per cell and one column per gene. Coercion to sparse matrices is easy via scipy.sparse.tocsr()
When running reduce_scanpy, ensure that the input is a scanpy anndata object.
Ensure that the data type of the input is either an integer or float.
Double-check that the code follows the docstring for the relevant function: reduce or reduce_scanpy.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.9

Dec 14, 2024

This version

0.0.8

Dec 4, 2024

0.0.7

Oct 27, 2021

0.0.6

Sep 7, 2021

0.0.5

Apr 5, 2021

0.0.4

Apr 5, 2021

0.0.3

Apr 3, 2021

0.0.2

Apr 3, 2021

0.0.1

Apr 3, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shannonca-0.0.8.tar.gz (24.6 kB view details)

Uploaded Dec 4, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shannonca-0.0.8-py3-none-any.whl (27.9 kB view details)

Uploaded Dec 4, 2024 Python 3

File details

Details for the file shannonca-0.0.8.tar.gz.

File metadata

Download URL: shannonca-0.0.8.tar.gz
Upload date: Dec 4, 2024
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for shannonca-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`f1fc8bd644903446a3ca388f819d3769a75dd447eac0ba9a6685f556a94839f4`
MD5	`4e1d5a8251d14690d289b92f5eab47c4`
BLAKE2b-256	`39ad434fe05d8085963c93bb53b82656afedfcab9c3dd48aea6b066ee9e8adbb`

See more details on using hashes here.

File details

Details for the file shannonca-0.0.8-py3-none-any.whl.

File metadata

Download URL: shannonca-0.0.8-py3-none-any.whl
Upload date: Dec 4, 2024
Size: 27.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for shannonca-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64825b54b56e971584adfa727250ff364d1fbdc18e1127d2db7e6cf12f925784`
MD5	`6de53f04dbe59505d08c61af1531b857`
BLAKE2b-256	`85dd7e7df1943783769730ea21acdd1a799dabe2fa4751fc0c9f0a50f7d059ae`

See more details on using hashes here.

shannonca 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

SCA (Surprisal Component Analysis)

Installation

Dependencies

Usage

Dimensionality Reduction

Without Scanpy

With Scanpy

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes