vireoSNP - donor deconvolution for multiplexed scRNA-seq data
Project description
vireo: donor deconvolution for pooled single-cell data
Vireo: Variational Inference for Reconstructing Ensemble Origin by expressed SNPs in multiplexed scRNA-seq data.
The name vireo follows the theme from cardelino (for clone deconvolution), while the Python package name is vireoSNP to aviod name confilict on PyPI.
Installation
Vireo is available through PyPI. To install, type the following command line, and add -U for upgrading:
pip install vireoSNP
Alternatively, you can download or clone this repository and type python setup.py install to install. In either case, add --user if you don’t have the permission as a root or for your Python environment.
For more instructions, see the installation manual.
Quick Usage
The following two subsections are quick usage guide. For more details, see the full manual or type vireo -h for all arguments. We also provide a demo.sh for running the test data sets in this repo.
Genotyping for each cell (pre-step)
There might be some bioinformatics efforts in this step, however, a few existing software can provide a solution. There are often two steps for this:
identify candidate SNPs: known common SNPs / freebayes / cellSNP
genotype candidate SNPs in each cell: cellSNP / vartrix / bcftools mpileup
See more introduction in the genotyping section.
Demultiplexing from allelic expression
The vireoSNP python package offers a set of utilities functions and an executable command line vireo for donor deconvolution in any of these four situations:
Mode 1: without any genotype:
vireo -c $CELL_DATA -N $n_donor -o $OUT_DIR
Mode 2: with genotype for all samples (specify tag -t: GT, GP, or PL)
vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR
Mode 3: with genotype for part of the samples (N is different from the sample number in $DONOR_GT_FILE)
vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR -N $n_donor
Mode 4: with genotype but not confident
vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR --forceLearnGT
In modes 3 and 4, the algorithm will run mode 1 first to estimate the genotypes of N donors and match them to the given donor genotypes (even partial). For the matched samples and SNPs, the input genotypes will replace the estiamted values as a prior in the second run.
Note, the cell data ($CELL_DATA) via -c can be any of the following two formats:
standard VCF file (compressed or uncompressed) with variants by cells
a cellSNP output folder containing VCF for variants info and sparse matrices AD and DP
Reference
Yuanhua Huang, Davis J. McCarthy, and Oliver Stegle. Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference. bioRxiv (2019): 598748.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.