Skip to main content

A light structured residual autoencoder and mutual nearest neighbor paring guided adversarial network for scRNA-seq batch correction.

Project description

ResPAN

This reporsity contains code and information of data used in the paper “ResPAN: a powerful batch correction model for scRNA-seq data through residual adversarial networks”. Source code for ResPAN are in the ResPAN folder, scipts for reproducing benchmarking results are in the scripts folder, and data information can be found in the data folder.

ResPAN is a light structured Residual autoencoder and mutual nearest neighbor Paring guided Adversarial Network for scRNA-seq batch correction. The workflow of ResPAN contains three key steps: generation of training data, adversarial training of the neural network, and generation of corrected data without batch effect. A figure summary is shown below.

alt text.

More details about ResPAN can be found in our manuscript.

Package requirement

ResPAN is implemented in Python and based on the framework of PyTorch. Before downloading and installing ResPAN, some packages need to be installed first. These required packages along with their versions used in our manuscript are listed below.

Package Version
numpy 1.18.1
pandas 1.3.5
scipy 1.8.0
scanpy 1.8.2
pytorch 1.10.2+cu113

Download

To download and install ResPAN, please copy and paste the following line to your terminal:

git clone https://github.com/AprilYuge/ResPAN.git

Brief tutorial

A brief tutorial of using ResPAN can be found below and under the folder tutorials.

To run our method, the first thing is to import necessary packages:

import numpy as np
import pandas as pd
import scanpy as sc
import scipy
from ResPAN import run_respan

Then we need to load the scRNA-seq data with batch information and preprocess it before running ResPAN:

# data loading
adata = sc.read_loom('CL_raw.loom', sparse=False) 
# pre-processing
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key='batch')
adata = adata[:, adata.var['highly_variable']]
# check if data is in sparse format
if isinstance(adata.X, scipy.sparse.csr.csr_matrix): 
    adata_new = sc.AnnData(adata.X.todense())
    adata_new.obs = adata.obs.copy()
    adata_new.obs_names = adata.obs_names
    adata_new.var_names = adata.var_names
    adata_new.obs_names.name = 'CellID'
    adata_new.var_names.name = 'Gene'
    del adata
    adata = adata_new

Now we can run ResPAN on the preprocessed data for batch correction. The output result is an AnnData object:

adata_new = run_respan(adata, batch_key='batch', epoch=300, batch=1024, reduction='pca', subsample=3000, seed=999)

As indicated in our manuscipt, we use PCA for dimensionality reduction, kPCA (reduction='kpca') and CCA (reduction='cca') are also implemented, but their performance were not as good as PCA. Meanwhile, we subsampled cells in each batch to 3,000 before finding random walk MNN pairs [1].

To visualize our results, we can use the following commands:

adata_new.raw = adata_new
sc.pp.scale(adata_new, max_value=10)
sc.tl.pca(adata_new, 20, svd_solver='arpack')
sc.pp.neighbors(adata_new)
sc.tl.umap(adata_new)
sc.set_figure_params(figsize=(5,5),fontsize=12)
sc.pl.umap(adata_new, color=['batch', 'celltype'], frameon=False, show=False)

Code references

For the implementation of ResPAN, we referred to WGAN-GP for the structure of Generative Adversarial Network and iMAP for the random walk mutual nearest neighbor method. Many thanks to their open-source treasure.

Paper references

[1] Wang, Dongfang, et al. "iMAP: integration of multiple single-cell datasets by adversarial paired transfer networks." Genome biology 22.1 (2021): 1-24.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ResPAN-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ResPAN-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file ResPAN-0.1.0.tar.gz.

File metadata

  • Download URL: ResPAN-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.6

File hashes

Hashes for ResPAN-0.1.0.tar.gz
Algorithm Hash digest
SHA256 775b4de52f3f9c0ad9fc90160580cecabc468ae3c8e89b570222adbee32dff5d
MD5 71b7adc4d5843a828b5dcd09fae8d154
BLAKE2b-256 79995de7247d68ce86e53ff8d4748c76f5653f995a5aaf6a5f5eb6fb13834be2

See more details on using hashes here.

File details

Details for the file ResPAN-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ResPAN-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.6

File hashes

Hashes for ResPAN-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3977c20f2482e5a3aa3ced484c35275b29b9f69838782eacadb747d64e2fdc18
MD5 d2979541850b7825eb07b552a8494367
BLAKE2b-256 d7b923d3c650de6abba5692d1d1d9490d31e085fe3cbfff5497a3dcb19e62ef0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page