Interspersed Repeats singl-cell quantifier

These details have not been verified by PyPI

Project links

Project description

GitHub Workflow Status

IRescue - Interspersed Repeats single-cell quantifier

IRescue quantifies the expression fo transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data, performing UMI-deduplication with sequencing errors correction and probabilistic assignment of multi-mapping reads by Expectation-Maximization (EM). The output is written on a sparse matrix compatible with Seurat, Scanpy and other toolkits.

Installation

Using conda (recommended)

We recommend using conda, as it will install all the required packages along IRescue.

conda create -n irescue -c conda-forge -c bioconda irescue

Using pip

If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: python>=3.8, samtools>=1.12, bedtools>=2.30.0, and fairly recent versions of the GNU utilities are required, specifically gawk>=5.0.1, coreutils>=8.30 and gzip>=1.10 (older versions are untested).

pip install irescue

Build from source

By building the package directly from the source, you can try out the features and bug fixes that will be implemented in the future release. As above, you need to install some requirements manually. Be aware that builds from the development branches may be unstable.

git clone https://github.com/bodegalab/irescue
cd irescue
pip install .

Container (Docker/Singularity)

Docker and Singularity containers are available for each conda release of IRescue. Choose the TAG corresponding to the desired IRescue version from the Biocontainers repository and pull or execute the container with Docker or Singularity:

# Get latest biocontainers tag (with curl and python3, otherwise check the above link for the desired version/tag)
TAG=$(curl -s -X GET https://quay.io/api/v1/repository/biocontainers/irescue/tag/ | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["tags"][0]["name"])')

# Run with Docker
docker run quay.io/biocontainers/irescue:$TAG irescue --help

# Run with Singularity
singularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescue --help

Usage

irescue --help

The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, CB tag for cell barcode and UR tag for UMI; override with --cb-tag and --umi-tag).

You can obtain it by aligning your reads using STARsolo. It is advised to keep secondary alignments in BAM file, that will be used in the EM procedure to assign multi-mapping reads (e.g. --outFilterMultimapNmax 100 --winAnchorMultimapNmax 100 or more), and remember to output all the needed SAM attributes (e.g. --outSAMattributes NH HI AS nM NM MD jM jI XS MC ch cN CR CY UR UY GX GN CB UB sM sS sQ).

RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. -g hg38), or provide your own annotation in bed format (e.g. -r TE.bed).

irescue -b genome_alignments.bam -g hg38

If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file (-w barcodes.tsv). This will significantly improve performance by processing viable cells only.

For optimal run time, use at least, e.g.: -p 8.

Output files

IRescue generates TE counts in a sparse matrix readable by Seurat or Scanpy into a counts/ subdirectory. Optional outputs include a description of equivalence classes with UMI deduplication stats ec_dump.tsv.gz and a subdirectory of temporary files tmp/ for debugging purpose. A detailed logging is enabled by --verbose and written to standard error.

irescue_out/
├── counts/
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ec_dump.tsv.gz
└── tmp/

Load IRescue data with Seurat

To integrate TE counts into an existing Seurat object containing gene expression data, they can be added as an additional assay:

# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)

# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)

# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])

# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay

The result will be something like this:

An object of class Seurat 
32276 features across 42513 samples within 2 assays 
Active assay: RNA (31078 features, 0 variable features)
 1 other assay present: TE

From here, TE expression can be normalized. To normalize according to gene counts or TE+gene counts, normalize manually or merge the assays. Reductions can be made using TE, gene or TE+gene expression.

Cite

Polimeni B, Marasca F, Ranzani V, Bodega B. IRescue: uncertainty-aware quantification of transposable elements expression at single cell level. bioRxiv 2022.09.16.508229; doi: https://doi.org/10.1101/2022.09.16.508229

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.0b2 pre-release

Mar 27, 2025

1.2.0b1 pre-release

Sep 19, 2024

This version

1.1.2

Sep 12, 2024

1.1.1

Aug 29, 2024

1.1.0

Aug 23, 2024

1.1.0b2 pre-release

Jul 11, 2024

1.1.0b1 pre-release

Mar 13, 2023

1.0.3

Feb 22, 2023

1.0.2

Oct 11, 2022

1.0.1

Sep 15, 2022

0.0.0

Jun 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irescue-1.1.2.tar.gz (19.3 kB view details)

Uploaded Sep 12, 2024 Source

Built Distribution

IRescue-1.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded Sep 12, 2024 Python 3

File details

Details for the file irescue-1.1.2.tar.gz.

File metadata

Download URL: irescue-1.1.2.tar.gz
Upload date: Sep 12, 2024
Size: 19.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for irescue-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`49a47a8859b3435cc94b171625c8c8c3b7bd7026498a0fd616dd48adb511e38c`
MD5	`6d436909165fc65c8e154c324e77f2d1`
BLAKE2b-256	`c11d2d2684145a59b7686cd932dd5c0dcfa2b10e19f208768f38398f0b29c162`

See more details on using hashes here.

File details

Details for the file IRescue-1.1.2-py3-none-any.whl.

File metadata

Download URL: IRescue-1.1.2-py3-none-any.whl
Upload date: Sep 12, 2024
Size: 22.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for IRescue-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ed9e591fd9c022a3ba016c2bbc6f239c6361e976cc72f88d6f1985936b42280f`
MD5	`3ac784b63a4308e07cf80352d17b4a99`
BLAKE2b-256	`5c4e523399c1f0332a1f910e4aee836342be1d2f8f6e8d22e65bbde3992aa2de`

See more details on using hashes here.

IRescue 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

IRescue - Interspersed Repeats single-cell quantifier

Content

Installation

Using conda (recommended)

Using pip

Build from source

Container (Docker/Singularity)

Usage

Output files

Load IRescue data with Seurat

Cite

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes