Interspersed Repeats singl-cell quantifier
Project description
IRescue - <ins>I</ins>nterspersed <ins>Re</ins>peats <ins>s</ins>ingle-<ins>c</ins>ell q<ins>u</ins>antifi<ins>e</ins>r
Content
Installation
Using conda (recommended)
We recommend using conda, as it will install all the required packages along IRescue.
conda create -n irescue -c conda-forge -c bioconda irescue
Using pip
If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: python>=3.7
, samtools>=1.12
and bedtools>=2.30.0
.
pip install irescue
Container (Docker/Singularity)
Docker and Singularity containers are available for each conda release of IRescue. Choose the TAG
corresponding to the desired IRescue version from the Biocontainers repository and pull or execute the container with Docker or Singularity:
# Get latest biocontainers tag (with curl and python3, otherwise check the above link for the desired version/tag)
TAG=$(curl -s -X GET https://quay.io/api/v1/repository/biocontainers/irescue/tag/ | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["tags"][0]["name"])')
# Run with Docker
docker run quay.io/biocontainers/irescue:$TAG irescue --help
# Run with Singularity
singularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescue --help
Usage
Quick start
The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, CB
tag for cell barcode and UR
tag for UMI; override with --CBtag
and --UMItag
). You can obtain it by aligning your reads using STARsolo.
RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. -g hg38
), or provide your own annotation in bed format (e.g. -r TE.bed
).
irescue -b genome_alignments.bam -g hg38
If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.: -w barcodes.tsv
. This will significantly improve performance.
IRescue performs best using at least 4 threads, e.g.: -p 8
.
Output files
IRescue generates TE counts in a sparse matrix format, readable by Seurat or Scanpy:
IRescue_out/
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
Load IRescue data with Seurat
To integrate TE counts into an existing Seurat object containing gene expression data, they can be added as an additional assay:
# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)
# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)
# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])
# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay
The result will be something like this:
An object of class Seurat
32276 features across 42513 samples within 2 assays
Active assay: RNA (31078 features, 0 variable features)
1 other assay present: TE
Cite
Polimeni B, Marasca F, Ranzani V, Bodega B. IRescue: single cell uncertainty-aware quantification of transposable elements expression. bioRxiv 2022.09.16.508229; doi: https://doi.org/10.1101/2022.09.16.508229
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.