Skip to main content

An end-to-end single-cell multimodal analysis model with deep parameter inference.

Project description

Modeling and analyzing single-cell multimodal data with deep parametric inference

The proliferation of single-cell multimodal sequencing technologies has enabled us to understand cellular heterogeneity with multiple views, providing novel and actionable biological insights into the disease-driving mechanisms. Here, we propose a comprehensive end-to-end single-cell multimodal data analysis framework named Deep Parametric Inference (DPI). The python packages, datasets and user-friendly manuals of DPI are freely available at https://github.com/studentiz/dpi.

The dpi framework works with scanpy and supports the following single-cell multimodal analyses

  • Multimodal data integration
  • Multimodal data noise reduction
  • Cell clustering and visualization
  • Reference and query cell types
  • Cell state vector field visualization

Pip install

pip install dpi-sc

Datasets

The dataset participating in "Single-cell multimodal modeling with deep parametric inference" can be downloaded at DPI data warehouse

Tutorial

We use Peripheral Blood Mononuclear Cell (PBMC) dataset to demonstrate the process of DPI analysis of single cell multimodal data. The following code is recommended to run on a computer with more than 64G memory.

Import dependencies

import scanpy as sc
import dpi

Retina image output (optional)

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Load dataset

# The dataset can be downloaded from [Datasets] above.
sc_data = sc.read_h5ad("PBMC_COVID19_Healthy_Annotated.h5ad")

Set marker collection

rna_markers = ["CCR7", "CD19", "CD3E", "CD4"]
protein_markers = ["AB_CCR7", "AB_CD19", "AB_CD3", "AB_CD4"]

Preprocessing

dpi.preprocessing(sc_data)
dpi.normalize(sc_data, protein_expression_obsm_key="protein_expression")
sc_data.var_names_make_unique()
sc.pp.highly_variable_genes(
    sc_data,
    n_top_genes=3000,
    flavor="seurat_v3",
    subset=False
)
dpi.add_genes(sc_data, rna_markers)
sc_data = sc_data[:,sc_data.var["highly_variable"]]
dpi.scale(sc_data)

Prepare and run DPI model

Configure DPI model parameters

dpi.build_mix_model(sc_data, net_dim_rna_list=[512, 128], net_dim_pro_list=[128], net_dim_rna_mean=128, net_dim_pro_mean=128, net_dim_mix=128, lr=0.0001)

Run DPI model

dpi.fit(sc_data, batch_size=256)

Visualize the loss

dpi.loss_plot(sc_data)

Save DPI model (optional)

dpi.saveobj2file(sc_data, "COVID19PBMC_healthy.dpi")
#sc_data = dpi.loadobj("COVID19PBMC_healthy.dpi")

Visualize the latent space

Extract latent spaces

dpi.get_spaces(sc_data)

Visualize the spaces

dpi.space_plot(sc_data, "mm_parameter_space", color="green", kde=True, bins=30)
dpi.space_plot(sc_data, "rna_latent_space", color="orange", kde=True, bins=30)
dpi.space_plot(sc_data, "pro_latent_space", color="blue", kde=True, bins=30)

Preparation for downstream analysis

Extract features

dpi.get_features(sc_data)

Get denoised datas

dpi.get_denoised_rna(sc_data)
dpi.get_denoised_pro(sc_data)

Cell clustering and visualization

Cell clustering

sc.pp.neighbors(sc_data, use_rep="mix_features")
dpi.umap_run(sc_data, min_dist=0.4)
sc.tl.leiden(sc_data)

Cell cluster visualization

sc.pl.umap(sc_data, color="leiden")

Observe multimodal data markers

RNA markers

dpi.umap_plot(sc_data, featuretype="rna", color=rna_markers, ncols=2)
dpi.umap_plot(sc_data, featuretype="rna", color=rna_markers, ncols=2, layer="rna_denoised")

Protein markers

dpi.umap_plot(sc_data, featuretype="protein", color=protein_markers, ncols=2)
dpi.umap_plot(sc_data, featuretype="protein", color=protein_markers, ncols=2, layer="pro_denoised")

Reference and query

Reference objects need to be pre-set with cell labels.

sc.pl.umap(sc_data, color="initial_clustering", frameon=False, title="PBMC COVID19 Healthy labels")

Demonstrate reference and query capabilities with unannotated asymptomatic COVID-19 PBMCs.

# The dataset can be downloaded from [Datasets] above.
filepath = "COVID19_Asymptomatic.h5ad"
sc_data_COVID19_Asymptomatic = sc.read_h5ad(filepath)

Unannotated data also needs to be normalized.

dpi.normalize(sc_data_COVID19_Asymptomatic, protein_expression_obsm_key="protein_expression")

Referenced and queried objects require alignment features.

sc_data_COVID19_Asymptomatic = sc_data_COVID19_Asymptomatic[:,sc_data.var.index]

Run the automated annotation function.

dpi.annotate(sc_data, ref_labelname="initial_clustering", sc_data_COVID19_Asymptomatic)

Visualize the annotated object.

sc.pl.umap(sc_data_COVID19_Asymptomatic, color="labels", frameon=False, title="PBMC COVID19 Asymptomatic Annotated")

Cell state vector field

Simulate and visualize the cellular state when the CCR7 protein is amplified 2-fold.

dpi.cell_state_vector_field(sc_data, feature="AB_CCR7", amplitude=2, obs="initial_clustering", featuretype="protein")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpi-sc-1.2.2.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

dpi_sc-1.2.2-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file dpi-sc-1.2.2.tar.gz.

File metadata

  • Download URL: dpi-sc-1.2.2.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for dpi-sc-1.2.2.tar.gz
Algorithm Hash digest
SHA256 1861d5226d17127b0e4a2cf562a1f603575c3bb5b84a1ec87acbef83ae7c79bf
MD5 0dc7a420dbf77efef84b90185f69bdf5
BLAKE2b-256 afb88595332d847c9933b4f379dbca4613521d9968898ce46b663837d6c7e9c8

See more details on using hashes here.

File details

Details for the file dpi_sc-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: dpi_sc-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for dpi_sc-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 04b3944f6e17f324fe44fdcda9f873bc4ba98161de9d9d97028f7bb1e29aacec
MD5 2d075e5f2d238d5bdc1382366d8bcd83
BLAKE2b-256 359782888cac9671899fb44536cc778db31e6f57513d6f674f49fc9b97a75570

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page