Quantifying transposable element (TEs) expression from single-cell sequencing data

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

scTE: Quantifying transposable element (TEs) expression from single-cell sequencing data

scTE takes as input:

Aligned sequence reads (BAM/SAM format)
The genomic location of TEs (BED format)
The genomic location of genes (GTF format)

scTE workflow

Note

This repository is a fork from https://github.com/JiekaiLab/scTE

Installation

From PyPI

$ pip install scte-quant

From conda

It is recommended to use conda for installation, since it enhanced reproducibility and easier to manage dependencies.

$ conda create -n scte --channel-priority 0 --override-channels -c bioconda -c conda-forge -c billsfriend scte

From source

$ git clone https://gitee.com/billsfriend/scTE
$ cd scTE
$ pip install .

Usage

Building genome indices

scTE builds genome indices for the fast alignment of reads to genes and TEs. These indices can be automatically generated using the commands:

$ scTE_build -g mm10 # Mouse
$ scTE_build -g hg38 # Human
$ scTE_build -g panTro6 # Chimpanzee
$ scTE_build -g macFas5 # Macaca fascicularis
$ scTE_build -g dm6 # Drosophila melanogaster
$ scTE_build -g danRer11 # Zebrafish
$ scTE_build -g xenTro9 # Xenopus tropicalis

These scripts will automatically download the genome annotations, for mouse:

$ ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M21/gencode.vM21.annotation.gtf.gz
$ http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz

Or for human:

$ ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/gencode.v30.annotation.gtf.gz
$ http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz

Or for Chimpanzee:

$ http://ftp.ensembl.org/pub/release-103/gtf/pan_troglodytes/Pan_troglodytes.Pan_tro_3.0.103.gtf.gz
$ https://hgdownload.soe.ucsc.edu/goldenPath/panTro6/database/rmsk.txt.gz

Or for Macaca fascicularis:

$ http://ftp.ensembl.org/pub/release-102/gtf/macaca_fascicularis/Macaca_fascicularis.Macaca_fascicularis_5.0.102.gtf.gz
$ http://hgdownload.soe.ucsc.edu/goldenPath/macFas5/database/rmsk.txt.gz

Or for Drosophila melanogaster:

$ http://ftp.ensembl.org/pub/release-103/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.32.103.gtf.gz
$ http://hgdownload.soe.ucsc.edu/goldenPath/dm6/database/rmsk.txt.gz

Or for Zebrafish:

$ http://ftp.ensembl.org/pub/release-103/gtf/danio_rerio/Danio_rerio.GRCz11.103.gtf.gz
$ https://hgdownload.soe.ucsc.edu/goldenPath/danRer11/database/rmsk.txt.gz

Or for Xenopus tropicalis:

$ http://ftp.ensembl.org/pub/release-103/gtf/xenopus_tropicalis/Xenopus_tropicalis.Xenopus_tropicalis_v9.1.103.gtf.gz
$ https://hgdownload.soe.ucsc.edu/goldenPath/xenTro9/database/rmsk.txt.gz

mm10, hg38, panTro6, macFas5, dm6, danRer11, xenTro9 is the genome assembly version. If you want to use your customs reference, you can use the -gene -te options:

scTE_build -te TEs.bed -gene Genes.gtf -o costum

-te
    Bed file for transposable elements annotation with at least 4 columns of chr, start, end & name of TE. Support .gz format. 
-gene
    Gtf file for genes annotation. Support .gz format.

For TEs.bed and Genes.gtf of other versions and species, TEs.bed derived from (rmsk.txt.gz) on UCSC goldenPath and Genes.gtf (<species>.gtf.gz) from Ensembl are well-tested and recommended.

Note that rmsk.txt.gz downloaded from UCSC goldenPath need to be converted into 4-column bed format before supplied to -te option. A simple zcat rmsk.txt.gz | cut 6-8,11 > rmsk.TE.bed will do.

For pre-set genomes in -g options, TEs in rmsk.txt.gz are filtered to include only LINE, SINE, LTR, Retrotranspon, Satellite and DNA (DNA TE). Satellite DNA is not classified as TE by the convention. If you want to customize your genome indices of TE, please filter TEs.bed as your will before running scTE_build.

For more information about BED and GTF format, see from UCSC. These annotations are then processed and converted into genome indices. The scTE algorithm will allocate reads first to gene exons, and then to TEs by default. Hence TEs inside exon/UTR regions of genes annotated in GENCODE will only contribute to the gene, and not to the TE score. This feature can be changed by setting –mode/-m inclusive in scTE, which will instruct scTE to assign the reads to both TEs and genes if a read comes from a TE inside exon/UTR regions of genes. If you want to remove the TEs inside the intron of genes, you can sete –mode/-m nointron in scTE

Analysis of 10x style scRNA-seq data

scTE makes BAM/SAM file as input, highly recommend to use unfiltered alignment file as input.

For bam file generated by STARsolo etc, the cell barcodes and UMI need to be integrated into the read 'CR:Z' or 'UR:Z' tage as bellow:

$ scTE -i inp.bam -o out -x mm10.exclusive.idx --hdf5 True -CB CR -UMI UR

$ samtools view test.bam
A00269:12:H7YF2DMXX:2   0   chr10   55902580    255 50M *   0   0   GTTCTCTCCGTATGTGAGCATGGGAGATACATCCCAGAAAGGCAGAAGGG  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:1  HI:i:1  AS:i:49 nM:i:0  CR:Z:CTAGAGTGTTTCGCTC   CY:Z:FFFFFFFFFFFFFFFF   UR:Z:TACATGACGC UY:Z:FFFFFFFFFF
A00269:13:H7YF2DMXX:2   0   chr10   55902784    255 50M *   0   0   ATAATCTTTGAGATCTCTGGTGAAAATAAGTAGCATAAAGGACAGAATCA  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:1  HI:i:1  AS:i:49 nM:i:0  CR:Z:CTAGAGTGTTTCGCTC   CY:Z:FFFFFFFFFFFFFFFF   UR:Z:TACATGACGC UY:Z:FFFFFFFFFF
A00269:14:H7YF2DMXX:2   0   chr13   67837311    255 50M *   0   0   CTGTTCATTATTTGAGGAAATCAGGACAGGAAATCAAACATGGCAGAATC  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:1  HI:i:1  AS:i:49 nM:i:0  CR:Z:ATCGAGTGTTTCGCTC   CY:Z:FFFFFFFFFFFFFFFF   UR:Z:TACATGACGC UY:Z:FFFFFFFFFF
A00269:15:H7YF2DMXX:2   0   chr14   114380523   255 50M *   0   0   GATCCAGATTAATTGAGACTGTTGATCCTCCTACAGGGTCGCCCTTCTCC  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:1  HI:i:1  AS:i:49 nM:i:0  CR:Z:CTAGAGTGTTTCGCTC   CY:Z:FFFFFFFFFFFFFFFF   UR:Z:TACATGACGC UY:Z:FFFFFFFFFF

For bam file generated by Cell Ranger etc, the cell barcodes and UMI need to be integrated into the read 'CB:Z' or 'UB:Z' tage as bellow:

$ scTE -i inp.bam -o out -x mm10.exclusive.idx --hdf5 True -CB CB -UMI UB

$ samtools view test.bam
A00519:758:HTCCHDSXY:3:2535:21296:19774 16  chr1    14021   0   90M *   0   0   TGGATTTCTATCTCCCTGGCTTGGTGCCAGTTCCTCCAAGTCGATGGCACCTCCCTCCCTCTCAACCACTTGAGCAAACTCCAAGACATC  ,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFF:FFFFF  NH:i:5  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3  RE:A:I  xf:i:0  CR:Z:CTCCCTCCACTGCGAC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:CTCCCTCCACTGCGAC-1 UR:Z:AAGGCGTAGTAG   UY:Z:FFFFFFFFFFFF   UB:Z:AAGGCGTAGTAG
A00519:758:HTCCHDSXY:1:1355:17237:31720 0   chr1    14260   0   90M *   0   0   CTCCCTCTCATCCCAGAGAAACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGTCACTGACCCC  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:5  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:1  RE:A:I  xf:i:0  CR:Z:TCGTCCACAGTATGAA   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TCGTCCACAGTATGAA-1 UR:Z:GACTTATTTTTT   UY:Z:FFFFFFFFFFFF   UB:Z:GACTTATTTTTT
A00519:758:HTCCHDSXY:3:2227:16703:32080 16  chr1    14411   1   90M *   0   0   TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG  FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3  RE:A:I  xf:i:0  CR:Z:TTGAGTGGTTGTGGCC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TTGAGTGGTTGTGGCC-1 UR:Z:TATAATGCTCAG   UY:Z:FFFFFFFFFFFF   UB:Z:TATAATGCTCAG
A00519:758:HTCCHDSXY:3:2563:23665:33802 16  chr1    14411   1   90M *   0   0   TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG  FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3  RE:A:I  xf:i:0  CR:Z:TGTTGAGAGGCAATGC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TGTTGAGAGGCAATGC-1 UR:Z:ACGGGTGTGGAG   UY:Z:FFFFFFFFFFFF   UB:Z:ACGGGTGTGGAG

-i
    Input file: BAM/SAM file from CellRanger or STARsolo
-o
    Output file prefix
-x
    The filename of the index for the reference genome annotation generated by scTE_build
-p
    Number of threads to use, Default: 1. scTE takes ~10Gb memory each thread for human and mouse genome.
--hdf5
    Save the output as .h5ad formatted file instead of csv file. Default: False

scTE is most tuned to STARsolo or the Cell Ranger pipeline outputs, and can accept BAM files produced by either of these two programs. For other aligners, the barcode should be stored in the CR:Z or CB:Z tag, and the UMI in the UR:Z or UB:Z tag in the BAM file

Analysis of C1 style scRNA-seq data

If the UMI is missing or not used in the scRNA-seq technology (for example on the Fluidigm C1 platform), it can be disabled with –UMI False (the default is True) switch in scTE. If the barcode is missing it can be disabled with the –CB False (the default is True), and instead the cell barcodes will be taken from the names of the BAM files.

$ scTE -i inp.bam -o out -x mm10.exclusive.idx -CB False -UMI False

multiple BAM files can be provided to scTE with the –i option

$ scTE -i *.bam -o out -x mm10.exclusive.idx -CB False -UMI False

$ scTE -i input1.bam,input2.bam,... -o out -x mm10.exclusive.idx -CB False -UMI False

Analysis of scATAC-seq data

The genome indices were prebuilt using:

$ wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz -O mm10.te.txt.gz
$ zcat mm10.te.txt.gz | grep -E 'LINE|SINE|LTR|Retroposon' | cut -f6-8,11 >mm10.te.bed
$ scTEATAC_build -g mm10.te.bed -o mm10.te.atac

Then the bam file can processe using scTE with the command:

scTEATAC -i input.bam -x mm10.te.atac.idx

Citation

If scTE is useful for your research, consider citing Nature Communications (2021)

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.3.9

Jun 13, 2026

1.3.8

Jun 10, 2026

1.3.7

Jun 10, 2026

1.3.6

Jun 10, 2026

1.3.5

Jun 3, 2026

This version

1.3.4

Jun 2, 2026

1.3.3

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scte_quant-1.3.4.tar.gz (203.2 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scte_quant-1.3.4-py3-none-any.whl (57.0 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file scte_quant-1.3.4.tar.gz.

File metadata

Download URL: scte_quant-1.3.4.tar.gz
Upload date: Jun 2, 2026
Size: 203.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for scte_quant-1.3.4.tar.gz
Algorithm	Hash digest
SHA256	`e3b537d5cbe6ef4691274b2fb343bb8ae036e1ef7e0f12d689a8283eb827e86c`
MD5	`d7e7f2512657772d141a8014cba63a66`
BLAKE2b-256	`e38c156687aec1fa986a4bf6a6d7d8efc00d35063344d5744609a1114755341b`

See more details on using hashes here.

File details

Details for the file scte_quant-1.3.4-py3-none-any.whl.

File metadata

Download URL: scte_quant-1.3.4-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 57.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for scte_quant-1.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8643eceacb2a608a0428f60a22bd9ecc3a2523527f809939557b4ce54ece982`
MD5	`c0cd2f066efc7e69e7c3625549242233`
BLAKE2b-256	`0ded35222fb4829e6391d1b472559ab93a55fec79bdf60d21018f2f74d2e33a2`

See more details on using hashes here.

scte-quant 1.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scTE: Quantifying transposable element (TEs) expression from single-cell sequencing data

Note

Installation

From PyPI

From conda

From source

Usage

Building genome indices

Analysis of 10x style scRNA-seq data

Analysis of C1 style scRNA-seq data

Analysis of scATAC-seq data

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes