Skip to main content

An advanced version of TelomereHunter for Python 3. Single-cell and bulk telomere content estimation from NGS data with improved accuracy and new features.

Project description

TelomereHunter2

PyPI version PyPI downloads Docs CI License: GPL v3 Python Versions DOI Last Commit Docker Pulls

TelomereHunter2 is a Python-based tool for estimating telomere content and analyzing telomeric variant repeats (TVRs) from genome sequencing data. It supports BAM/CRAM files, flexible telomere repeat and reference genome inputs, and provides outputs for bulk and single-cell genome sequencing data.


Release Notes

See RELEASE_NOTES.md for the latest changes and version history.


New Features

  • Fast, container-friendly Python 3 implementation
  • Parallelization and algorithmic steps for drastic speedup
  • Supports BAM/CRAM, custom telomeric repeats, and now also non-human genomes
  • Static and interactive HTML reports (Plotly)
  • Docker and Apptainer/Singularity containers
  • Single cell sequencing support (e.g. scATAC-seq; barcode splitting and per-cell analysis)
  • Robust input handling and exception management
  • Fast mode for quick overview of unmapped reads

Installation

Classic setup:

pip install telomerehunter2

From source:

# From repository:
git clone https://github.com/ferdinand-popp/telomerehunter2.git
cd telomerehunter2
python -m venv venv
source venv/bin/activate
pip install -e . --no-cache-dir

# With uv:
git clone https://github.com/ferdinand-popp/telomerehunter2.git
cd telomerehunter2
uv pip install -e . --no-cache-dir

Container usage:
See Container Usage for Docker/Apptainer instructions.

Operating systems:
Currently tested on Linux and macOS. Windows support via WSL2 and Docker not completely tested (WIP check GitHub Issues)

Usage explanation for Bulk and single cell Analysis

Tutorial external data runs under Tutorial

Bulk Analysis

telomerehunter2 -ibt TUMOR_FILE -ibc CONTROL_FILE -o OUTPUT_DIRECTORY -p ID_OF_SAMPLE -b BANDING_FILE [options]
  • Single sample:
    telomerehunter2 -ibt sample.bam -o results/ -p SampleID -b telomerehunter2/cytoband_files/hg19_cytoBand.txt
  • Tumor vs Control:
    telomerehunter2 -ibt sample.bam -ibc control.bam -o results/ -p PairID -b telomerehunter2/cytoband_files/hg19_cytoBand.txt
  • Custom repeats/species:
    telomerehunter2 ... --repeats TTTAGGG TTAAGGG --repeatsContext TTAAGGG
  • Fast mode (quick overview of unmapped reads generating summary with overview):
    telomerehunter2 -ibt sample.bam -o results/ -p SampleID --fast_mode

Single cell sequencing Analysis

TelomereHunter2 now supports direct single-cell BAM analysis (with CB barcode tag). Simply run:

telomerehunter2_sc -ibt sample.bam -o results/ -p SampleID -b telomerehunter2/cytoband_files/cytoband.txt --min-reads-per-barcode 10000

This will perform barcode-aware telomere analysis and output per-cell results in a summary file. The minimum reads per barcode threshold can be set with --min-reads-per-barcode. To rerun postprocessing with adjusted --min-reads-per-barcode threshold run command again with --noFiltering to skip the expensive filtering step from all reads to telomeric reads. If the reads have a different barcode tag than CB, use --barcodeTag to set the correct one. More information on correcting chromatin state for scATAC follows in (Engel et al., 2024).

See tests/test_telomerehunter2_sc.py for example usage and validation.

Usage full list of option

telomerehunter2 --help

Input & Output

Input:

  • BAM/CRAM files (aligned reads, <-ibt> for tumor, <-ibc> for control)
  • Cytoband file (tab-delimited, e.g. telomerehunter2/cytoband_files/hg19_cytoBand.txt, <-b>)
  • Identifier for sample/pair (<-p>)
  • Optional: custom telomeric repeats

Output:

  • summary.tsv, TVR_top_contexts.tsv, singletons.tsv
  • Plots (plots/ directory, PNG/HTML)
  • Logs (run status/errors)
  • For sc-seq: Additionally to the complete bulk run you get per-cell results in sc_summary.tsv and barcode_counts.tsv with reads counts per barcode

Explanation of summary.tsv

Column Example Description
PID PATIENT1 Sample name
sample tumor Sample classification (tumor (single), control, log2(t/c))
tel_content 1.8 Intratelomeric reads / reads in GC correction range * 1e6
total_reads 120 Number of reads in the input file
read_lengths 25,36,42,54 Unique lengths of reads
repeat_threshold_set 6 per 100 bp Telomeric repeat threshold set
repeat_threshold_used 4 Repeats threshold applied based on avg. read length
intratelomeric_reads 4 Filtered Tel reads in unmapped reads
junctionspanning_reads 0 Filtered Tel reads spanning junctions into first/last band
subtelomeric_reads 6 Filtered Tel reads in subtelomeric regions (first/last band)
intrachromosomal_reads 0 Filtered Tel reads in intrachromosomal regions
tel_read_count 10 Total telomeric reads identified
gc_bins_for_correction 48-52 GC content range used for normalization of reads
total_reads_with_tel_gc 8 Total reads within GC bin for normalization
TCAGGG_arbitrary_context_norm_by_intratel_reads 1.5 Telomeric variant repeat count normalized by intratelomeric reads
... ... ...
TCAGGG_singletons_norm_by_all_reads 0.0 Singleton (TVR flanked by canonicals) count normalized by all reads
... ... ...

Dependencies

  • Python >=3.6
  • pysam, numpy, pandas, plotly, PyPDF2
  • For static image export: kaleido (requires chrome/chromium)
  • Docker/Apptainer (optional)

Install all dependencies:

pip install -r requirements.txt

Container Usage

Docker (recommended):

Build locally:

docker build -t telomerehunter2 .
docker run --rm -it -v /data:/data telomerehunter2 telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt

Pull from Docker Hub:

docker pull fpopp22/telomerehunter2

Run from Docker Hub:

docker run --rm -it -v /data:/data fpopp22/telomerehunter2 telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt

Apptainer/Singularity:

Build locally:

apptainer build telomerehunter2.sif Apptainer_TH2.def
# mount data needed
apptainer run telomerehunter2.sif telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt

Pull from Docker Hub (as Apptainer image):

apptainer pull docker://fpopp22/telomerehunter2:latest
apptainer run telomerehunter2_latest.sif telomerehunter2 ...

Troubleshooting

  • Memory errors: Use more RAM or limit cores used with -c flag.
  • Missing dependencies: Check requirements.txt.
  • Banding file missing: Needs reference genome banding file -b otherwise analysis will run without reads mapped to subtelomeres.
  • Plotting: Try disabling with --plotNone or use plotting only mode with --plotNone.
  • Minor changes to TH1: Skipping the tvrs normalization per 100 bp, improved detection of GXXGGG TVRs, read lengths are estimated from first 1000 reads, added TRPM

For help: GitHub Issues or our FAQ.

Documentation & Resources

Tutorial

  • Simple run with 1000genomes WGS data:
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam

wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.unmapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam

telomerehunter2 -p test20vsunmapped -ibt HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam -ibc HG00096.unmapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam -o /results/
  • scATAC run with 10x genomics test data:
wget https://cf.10xgenomics.com/samples/cell-atac/2.1.0/10k_pbmc_ATACv2_nextgem_Chromium_Controller/10k_pbmc_ATACv2_nextgem_Chromium_Controller_possorted_bam.bam

wget https://cf.10xgenomics.com/samples/cell-atac/2.1.0/10k_pbmc_ATACv2_nextgem_Chromium_Controller/10k_pbmc_ATACv2_nextgem_Chromium_Controller_possorted_bam.bam.bai

telomerehunter2_sc -p scATAC_test -ibt 10k_pbmc_ATACv2_nextgem_Chromium_Controller_possorted_bam.bam -o /results/ -b telomerehunter2/cytoband_files/hg19_cytoBand.txt --min-reads-per-barcode 30000

# for generating the celltype resolution plot please generate annotation following the exemplary script at [single-cell_annotations.md](docs/single-cell_annotations.md).

Citation

If you use TelomereHunter2, please cite:

  • Feuerbach, L., et al. "TelomereHunter – in silico estimation of telomere content and composition from cancer genomes." BMC Bioinformatics 20, 272 (2019). https://doi.org/10.1186/s12859-019-2851-0
  • Application Note for TH2 (in preparation).

Contributing

Fork, branch, and submit pull requests. Please add tests and follow code style. For major changes, open an issue first.

License

GNU General Public License v3.0. See LICENSE.

Contact

Acknowledgements

Developed by Ferdinand Popp, Lina Sieverling, Philip Ginsbach, Lars Feuerbach. Supported by German Cancer Research Center (DKFZ) - Division Applied Bioinformatics.


Copyright 2025 Ferdinand Popp, Lina Sieverling, Philip Ginsbach, Lars Feuerbach

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

telomerehunter2-1.0.9.tar.gz (128.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

telomerehunter2-1.0.9-py3-none-any.whl (119.2 kB view details)

Uploaded Python 3

File details

Details for the file telomerehunter2-1.0.9.tar.gz.

File metadata

  • Download URL: telomerehunter2-1.0.9.tar.gz
  • Upload date:
  • Size: 128.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for telomerehunter2-1.0.9.tar.gz
Algorithm Hash digest
SHA256 9385e71ed3047486562dd581d976ff629277cfcc13d255a373fd5f7a360ecd9b
MD5 4ffef75d23468b05f77fd25a8c15eb0e
BLAKE2b-256 af2011dacde984fd11072c1cfef602f94071e5a36a923a22e1488af43ba19cd9

See more details on using hashes here.

File details

Details for the file telomerehunter2-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: telomerehunter2-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 119.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for telomerehunter2-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 d74733134fe12620a399fe27a0a230fa5553b72b14ab9991cd57e2592554380e
MD5 f9f44d547175539e026ca40a311fc9a5
BLAKE2b-256 15542986b38757c65117157f337c2511e36683246d3966b837598fd1394f92fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page