quasilinear representation methods for single-cellomics data
Project description
An analytical framework for interpretable and generalizable single-cell data analysis
Quasildr is a python library for quasilinear data representation methods. It implements two methods, a data representation or visualization method GraphDR and a generalized trajectory extraction and inference method StructDR (StructDR is based on nonparametric ridge estimation). The Quasildr package is developed for single-cell omics data analysis, but supports other data types as well. The manuscript is available here.
Install
You can install with conda install -c main -c conda-forge -c bioconda quasildr
or with pip install quasildr
. You can also clone the respository and install with git clone https://github.com/jzthree/quasildr; cd quasildr; python setup.py install
.
Quick Start
For learning about the package, we recommend checking out the tutorials. We provide them in both jupyter notebooks format (you may use nteract https://nteract.io/ to open them) or html files rendered from jupyter notebooks. The visualizations are large so Github does not allow preview, and you need to download it first. For various manuscript examples, checkout jupyter notebooks in the Manuscript directory.
As a quickest possible introduction, a minimum example python snippet that running these methods are below
#GraphDR
import quasildr.graphdr import graphdr
Z = graphdr(X_pca, regularization=500)
#StructDR
import quasildr.structdr import Scms
Z = Z / Z[:,0].std()
s = Scms(Z, bw=0.1, min_radius = 10)
T = s.scms(Z)
If you are analyzing single-cell data, you may consider using our graphical interface for single-cell omics data analysis Trenti.
Documentation
See full API documentation here. For a high-level introduction to two main methods in quasildr, GraphDR and StructDR (DR means Data Representation):
Update log
v0.2.2 (10/05/2021): Update the Trenti graphical interface app to use Dash 2.0. Bug fixes for Trenti and speed improvement from Dash 2.0.0. Please update to Dash 2.0 if you will use Trenti.
GraphDR - visualization and general-purpose representation:
GraphDR is a nonlinear representation method that preserves the interpretation of a corresponding linear space, while being able to well represent cell identities like nonlinear methods. Unlike popular nonlinear methods, GraphDR allows direct comparison across datasets by applying a common linear transform. GraphDR also supports incorporating complex experiment design through graph construction (see example from manuscript and ./Manuscript directory). GraphDR is also very fast. It can process a 1.5 million-cell dataset in 5min (CPU) or 1.5min (CPU) and can easily scale to even larger datasets.
StructDR - flexible structure extraction and inference of confidence sets:
StructDR is based on nonparametric density ridge estimation (NRE). StructDR is a flexible framework for structure extraction for single-cell data that unifies cluster, trajectory, and surface estimation by casting these problems as identifying 0-, 1-, and 2- dimensional density ridges. StructDR also support adaptively decides ridge dimensionality based on data. When used with linear representation such as PCA, StructDR allows inference of confidence sets of density ridge positions. This allows, for example, estimation of uncertainties of the developmental trajectories extracted.
Command-line tools
We also provide command-line tools to run those methods without writing any code. Basic single-cell data preprocessing options are provided in run_graphdr.py
, even though we generally recommend preprocessing single cell data with a dedicated package such as scanpy or Seurat to select highly variable genes and normalize before providing it to GraphDR. You can add the -h
option to access help information to each tool.
- run_graphdr.py
python run_graphdr.py ./example/Dentate_Gyrus.spliced_data.gz --pca --plot --log --transpose --scale --max_dim 50 --refine_iter 4 --reg 500 --no_rotation --anno_file ./example/Dentate_Gyrus.anno.gz --anno_column ClusterName
- run_structdr.py
python run_structdr.py --bw 0.1 --automatic_bw 0 --input ./example/Dentate_Gyrus.spliced_data.gz.dim50_k10_reg500_n4t12_pca_no_rotation_log_scale_transpose.graphdr.small.gz --anno_file ./example/Dentate_Gyrus.anno.small.gz --anno_column ClusterName --output ./example/Dentate_Gyrus.spliced_data.gz.dim50_k10_reg500_n4t12_pca_no_rotation_log_scale_transpose.graphdr.small.gz
Graphical Interface - Trenti
We developed a web-based GUI, Trenti (Trajectory exploration interface), for single cell data visualization and exploratory analysis, supporting GraphDR, StructDR, common dimensionality reduction and clustering methods, and provide a 3D interface for visualization and a gene expression exploration interface.
To use Trenti, you need to install additional dependencies:
pip install umap-learn dash==2.0.0 dash-colorscales networkx
See ./trenti/README.md for details. For a quick-start example, run
python ./trenti/app.py -i ./example/Dentate_Gyrus.data_pca.gz -f ./example/Dentate_Gyrus.spliced_data.gz -a ./example/Dentate_Gyrus.anno.gz --samplelimit=5000 --log --mode graphdr
then visit localhost:8050
in your browser.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file quasildr-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: quasildr-0.2.2-py3-none-any.whl
- Upload date:
- Size: 57.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/0.23 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06d8de1d2589f10c327c0f1caeca46eb3fe0021db5bf64163bbc7f24d24dfbe9 |
|
MD5 | f02a18cbd9ba5eb43447d13a783a8baa |
|
BLAKE2b-256 | 09efe3d830eae12d07b3119eea1532e68c99312c2cb98661c4268916e50c6871 |