Interspersed Repeats singl-cell quantifier
Project description
IRescue - Interspersed Repeats single-cell quantifier
IRescue is a software for quantifying the expression of transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data. The core feature of IRescue is to consider all multiple alignments (i.e. non-primary alignments) of reads/UMIs mapping on multiple TEs in a BAM file, to accurately infer the TE subfamily of origin. IRescue implements a UMI error-correction, deduplication and quantification strategy that includes such alignment events. IRescue's output is compatible with most scRNA-seq analysis toolkits, such as Seurat or Scanpy.Content
Installation
Using conda (recommended)
We recommend using conda, as it will install all the required packages along IRescue.
conda create -n irescue -c conda-forge -c bioconda irescue
Using pip
If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: python>=3.7
, samtools>=1.12
and bedtools>=2.30.0
.
pip install irescue
Usage
Quick start
The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, CB
tag for cell barcode and UR
tag for UMI; override with --CBtag
and --UMItag
). You can obtain it by aligning your reads using STARsolo.
RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. -g hg38
), or provide your own annotation in bed format (e.g. -r TE.bed
).
irescue -b genome_alignments.bam -g hg38
If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.: -w barcodes.tsv
. This will significantly improve performance.
IRescue performs best using at least 4 threads, e.g.: -p 8
.
Output files
IRescue generates TE counts in a sparse matrix format, readable by Seurat or Scanpy:
IRescue_out/
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
Load IRescue data with Seurat
To integrate TE counts into an existing Seurat object containing gene expression data, they can be added as an additional assay:
# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)
# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)
# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])
# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay
The result will be something like this:
An object of class Seurat
32276 features across 42513 samples within 2 assays
Active assay: RNA (31078 features, 0 variable features)
1 other assay present: TE
Cite
Polimeni B, Marasca F, Ranzani V, Bodega B. IRescue: single cell uncertainty-aware quantification of transposable elements expression. bioRxiv 2022.09.16.508229; doi: https://doi.org/10.1101/2022.09.16.508229
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.