Interspersed Repeats singl-cell quantifier

These details have not been verified by PyPI

Project links

Project description

GitHub Workflow Status

IRescue - Interspersed Repeats single-cell quantifier

IRescue is a software for quantifying the expression of transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data. The core feature of IRescue is to consider all multiple alignments (i.e. non-primary alignments) of reads/UMIs mapping on multiple TEs in a BAM file, to accurately infer the TE subfamily of origin. IRescue implements a UMI error-correction, deduplication and quantification strategy that includes such alignment events. IRescue's output is compatible with most scRNA-seq analysis toolkits, such as Seurat or Scanpy.

Installation

Using conda (recommended)

We recommend using conda, as it will install all the required packages along IRescue.

conda create -n irescue -c conda-forge -c bioconda irescue

Using pip

If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: python>=3.7, samtools>=1.12 and bedtools>=2.30.0.

pip install irescue

Usage

Quick start

The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, CB tag for cell barcode and UR tag for UMI; override with --CBtag and --UMItag). You can obtain it by aligning your reads using STARsolo.

RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. -g hg38), or provide your own annotation in bed format (e.g. -r TE.bed).

irescue -b genome_alignments.bam -g hg38

If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.: -w barcodes.tsv. This will significantly improve performance.

IRescue performs best using at least 4 threads, e.g.: -p 8.

Output files

IRescue generates TE counts in a sparse matrix format, readable by Seurat or Scanpy:

IRescue_out/
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz

Load IRescue data with Seurat

To integrate TE counts into an existing Seurat object containing gene expression data, they can be added as an additional assay:

# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)

# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)

# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])

# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay

The result will be something like this:

An object of class Seurat 
32276 features across 42513 samples within 2 assays 
Active assay: RNA (31078 features, 0 variable features)
 1 other assay present: TE

Cite

Polimeni B, Marasca F, Ranzani V, Bodega B. IRescue: single cell uncertainty-aware quantification of transposable elements expression. bioRxiv 2022.09.16.508229; doi: https://doi.org/10.1101/2022.09.16.508229

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.0b1 pre-release

Sep 19, 2024

1.1.2

Sep 12, 2024

1.1.1

Aug 29, 2024

1.1.0

Aug 23, 2024

1.1.0b2 pre-release

Jul 11, 2024

1.1.0b1 pre-release

Mar 13, 2023

1.0.3

Feb 22, 2023

This version

1.0.2

Oct 11, 2022

1.0.1

Sep 15, 2022

0.0.0

Jun 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IRescue-1.0.2.tar.gz (12.4 kB view hashes)

Uploaded Oct 11, 2022 Source

Built Distribution

IRescue-1.0.2-py3-none-any.whl (14.5 kB view hashes)

Uploaded Oct 11, 2022 Python 3

Hashes for IRescue-1.0.2.tar.gz

Hashes for IRescue-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`da582e632982b19f25b1ed45bb4949b24d76486fc917f87cc9f06e481fa0720c`
MD5	`018099bfe1bec0f30fadba265d28b19b`
BLAKE2b-256	`ab56469659df7d5da0ad30010922ace06c264a608a870b77b980fde3183aa7ec`

Hashes for IRescue-1.0.2-py3-none-any.whl

Hashes for IRescue-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eaaceda5e720e8c6a5d8aa9f8c542f1cdbfa26fbdf04121793b7b264f7d8e3de`
MD5	`d88b5d32993b7c31e785e0ce229a3434`
BLAKE2b-256	`3b76d4652594c7606bef241d17c7f60c135e817d3c24bf684fd1b089152d1937`