Skip to main content

Uncertainty-aware quantification of transposable elements expression in scRNA-seq

Project description

GitHub Workflow Status PyPI container install with bioconda paper zenodo

IRescue - Interspersed Repeats single-cell quantifier

IRescue quantifies the expression fo transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data, performing UMI-deduplication with sequencing errors correction (for 10X-like libraries) or read quantification (for UMI-less libraries, e.g. SMART-seq) followed by probabilistic assignment of multi-mapping reads by an Expectation-Maximization (EM) procedure. The output is written on a sparse matrix compatible with Seurat, Scanpy and other toolkits.

Content

Installation

Using conda (recommended)

Use conda (or mamba or micromamba) to install IRescue with all its dependencies.

conda create -n irescue -c conda-forge -c bioconda irescue

Using pip

If installing with pip, the following requirements must be installed manually: python>=3.9, samtools>=1.12, bedtools>=2.30.0, and fairly recent versions of the GNU utilities are required (tested on gawk>=5.0.1, coreutils>=8.30 and gzip>=1.10).

pip install irescue

Install a pre-release version

You can install or upgrade to a pre-release using pip:

pip install -U --pre irescue

Build from source

By building the package directly from the source, you can try out the features and bug fixes that will be implemented in the future release. As above, you need to install some requirements manually. Be aware that builds from the development branches may be unstable.

git clone https://github.com/bodegalab/irescue
cd irescue
pip install .

Container (Docker/Singularity)

Docker and Singularity containers are available for each conda release of IRescue. Choose the TAG corresponding to the desired IRescue version from the Biocontainers repository and pull or execute the container with Docker or Singularity:

# Get latest biocontainers tag (with curl and python3, otherwise check the above link for the desired version/tag)
TAG=$(curl -s -X GET https://quay.io/api/v1/repository/biocontainers/irescue/tag/ | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["tags"][0]["name"])')

# Run with Docker
docker run quay.io/biocontainers/irescue:$TAG irescue --help

# Run with Singularity
singularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescue --help

Usage

Inspect all parameters:

irescue --help

Quick start:

irescue -b genome_alignments.bam -g hg38

Required inputs

BAM file sorted by coordinate, indexed and annotated with cell barcode and, optionally, UMI sequences as tags (e.g. CB and UR tags, configurable with --cb-tag and --umi-tag)

It can be obtained by aligning reads using STARsolo. It is highly recommended to keep secondary alignments in BAM file, that will be used in the EM procedure to redistribute multi-mapping reads (at least --outFilterMultimapNmax 100 --winAnchorMultimapNmax 100), and remember to output all the needed SAM attributes (e.g. --outSAMattributes NH HI AS nM NM MD jM jI XS MC ch cN CR CY UR UY GX GN CB UB sM sS sQ).

Custom annotation

A custom repeats annotation can be provided in BED format (e.g. -r TE.bed) of at least four columns, with the fourth column being the TE feature name (e.g. subfamily name).

UMI-less libraries (e.g. SMART-seq)

Only in pre-release version 1.2.0b2 or later.

You can ignore the UMI sequence (thus skipping UMI-deduplication entirely) with --no-umi:

irescue -b genome_alignments.bam -g hg38 --no-umi

NB: the BAM tag for cell barcodes sometimes is RG instead of CB. In such case, add the parameter --cb-tag RG.

Best practices

  • If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file (-w barcodes.tsv). This will significantly improve performance by processing viable cells only.

  • For optimal run time, use at least 4-8 cpus, e.g.: -p 8.

Output files

IRescue generates TE counts in a sparse matrix readable by Seurat or Scanpy into a counts/ subdirectory. Optional outputs include a description of equivalence classes with UMI deduplication stats ec_dump.tsv.gz and a subdirectory of temporary files tmp/ for debugging purpose (only kept with the --keeptmp parameter). You can enable a highly detailed logging with -vv (printed to the terminal's stderr).

irescue_out/
├── counts/
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ec_dump.tsv.gz
└── tmp/

Load IRescue data with Seurat

Multiple assays can be exploited to integrate TE counts into an existing Seurat object containing gene expression data:

# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)

# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)

# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])

# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay

The result will be something like this:

An object of class Seurat 
32276 features across 42513 samples within 2 assays 
Active assay: RNA (31078 features, 0 variable features)
 1 other assay present: TE

From here, TE expression can be normalized. To normalize according to gene counts or TE+gene counts, normalize manually or merge the assays. Reductions can be made using TE, gene or TE+gene expression.

Cite

Benedetto Polimeni, Federica Marasca, Valeria Ranzani, Beatrice Bodega, IRescue: uncertainty-aware quantification of transposable elements expression at single cell level, Nucleic Acids Research, 2024; https://doi.org/10.1093/nar/gkae793

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irescue-1.2.0.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

irescue-1.2.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file irescue-1.2.0.tar.gz.

File metadata

  • Download URL: irescue-1.2.0.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for irescue-1.2.0.tar.gz
Algorithm Hash digest
SHA256 79c0902df3df842a2b4b9d5760d79a0b8808002fcdca11d4c46322bc1812acdf
MD5 5f69469e1a1f756bd0ed545f37dfe436
BLAKE2b-256 7312ccaa65d4a25b68c29e7e606bd77ef6db2a4f51dc63c934c6b91aac966725

See more details on using hashes here.

File details

Details for the file irescue-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: irescue-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for irescue-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2e707fd6543623e31d0e793c4af1b8b45285868f0711e3bf2d12e094ff2c28c9
MD5 53f8dfa1645b2db8f47356f603cf5881
BLAKE2b-256 17744fa4c30132ea3628d01107d047412c2b9601a553ba12d5c3f218563abea1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page