An advanced version of TelomereHunter for Python 3. Single-cell and bulk telomere content estimation from NGS data with improved accuracy and new features.
Project description
TelomereHunter2
TelomereHunter2 is a Python-based tool for estimating telomere content and analyzing telomeric variant repeats (TVRs) from genome sequencing data. It supports BAM/CRAM files, flexible telomere repeat and reference genome inputs, and provides outputs for bulk and single-cell genome sequencing data.
Release Notes
See RELEASE_NOTES.md for the latest changes and version history.
New Features
- Fast, container-friendly Python 3 implementation
- Parallelization and algorithmic steps for drastic speedup
- Supports BAM/CRAM, custom telomeric repeats, and now also non-human genomes
- Static and interactive HTML reports (Plotly)
- Docker and Apptainer/Singularity containers
- Single cell sequencing support (e.g. scATAC-seq; barcode splitting and per-cell analysis)
- Robust input handling and exception management
- Fast mode for quick overview of unmapped reads
Installation
Classic setup:
pip install telomerehunter2
From source:
# From repository:
git clone https://github.com/ferdinand-popp/telomerehunter2.git
cd telomerehunter2
python -m venv venv
source venv/bin/activate
pip install -e . --no-cache-dir
# With uv:
git clone https://github.com/ferdinand-popp/telomerehunter2.git
cd telomerehunter2
uv pip install -e . --no-cache-dir
Container usage:
See Container Usage for Docker/Apptainer instructions.
Operating systems:
Currently tested on Linux and macOS. Windows support via WSL2 and Docker not completely tested (WIP check GitHub Issues)
Usage explanation for Bulk and single cell Analysis
Tutorial external data runs under Tutorial
Bulk Analysis
telomerehunter2 -ibt TUMOR_FILE -ibc CONTROL_FILE -o OUTPUT_DIRECTORY -p ID_OF_SAMPLE -b BANDING_FILE [options]
- Single sample:
telomerehunter2 -ibt sample.bam -o results/ -p SampleID -b telomerehunter2/cytoband_files/hg19_cytoBand.txt - Tumor vs Control:
telomerehunter2 -ibt sample.bam -ibc control.bam -o results/ -p PairID -b telomerehunter2/cytoband_files/hg19_cytoBand.txt - Custom repeats/species:
telomerehunter2 ... --repeats TTTAGGG TTAAGGG --repeatsContext TTAAGGG - Fast mode (quick overview of unmapped reads generating summary with overview):
telomerehunter2 -ibt sample.bam -o results/ -p SampleID --fast_mode
Single cell sequencing Analysis
TelomereHunter2 now supports direct single-cell BAM analysis (with CB barcode tag). Simply run:
telomerehunter2_sc -ibt sample.bam -o results/ -p SampleID -b telomerehunter2/cytoband_files/cytoband.txt --min-reads-per-barcode 10000
This will perform barcode-aware telomere analysis and output per-cell results in a summary file. The minimum reads per
barcode threshold can be set with --min-reads-per-barcode. To rerun postprocessing with adjusted
--min-reads-per-barcode
threshold run command again with --noFiltering to skip the expensive filtering step from all reads to telomeric reads.
If the reads have a different barcode tag than CB, use --barcodeTag to set the correct one.
More information on correcting chromatin state for scATAC follows in (Engel et al., 2024).
See tests/test_telomerehunter2_sc.py for example usage and validation.
Usage full list of option
telomerehunter2 --help
Input & Output
Input:
- BAM/CRAM files (aligned reads, <-ibt> for tumor, <-ibc> for control)
- Cytoband file (tab-delimited, e.g.
telomerehunter2/cytoband_files/hg19_cytoBand.txt, <-b>) - Identifier for sample/pair (<-p>)
- Optional: custom telomeric repeats
Output:
summary.tsv,TVR_top_contexts.tsv,singletons.tsv- Plots (
plots/directory, PNG/HTML) - Logs (run status/errors)
- For sc-seq: Additionally to the complete bulk run you get per-cell results in sc_summary.tsv and barcode_counts.tsv with reads counts per barcode
Explanation of summary.tsv
| Column | Example | Description |
|---|---|---|
| PID | PATIENT1 | Sample name |
| sample | tumor | Sample classification (tumor (single), control, log2(t/c)) |
| tel_content | 1.8 | Intratelomeric reads / reads in GC correction range * 1e6 |
| total_reads | 120 | Number of reads in the input file |
| read_lengths | 25,36,42,54 | Unique lengths of reads |
| repeat_threshold_set | 6 per 100 bp | Telomeric repeat threshold set |
| repeat_threshold_used | 4 | Repeats threshold applied based on avg. read length |
| intratelomeric_reads | 4 | Filtered Tel reads in unmapped reads |
| junctionspanning_reads | 0 | Filtered Tel reads spanning junctions into first/last band |
| subtelomeric_reads | 6 | Filtered Tel reads in subtelomeric regions (first/last band) |
| intrachromosomal_reads | 0 | Filtered Tel reads in intrachromosomal regions |
| tel_read_count | 10 | Total telomeric reads identified |
| gc_bins_for_correction | 48-52 | GC content range used for normalization of reads |
| total_reads_with_tel_gc | 8 | Total reads within GC bin for normalization |
| TCAGGG_arbitrary_context_norm_by_intratel_reads | 1.5 | Telomeric variant repeat count normalized by intratelomeric reads |
| ... | ... | ... |
| TCAGGG_singletons_norm_by_all_reads | 0.0 | Singleton (TVR flanked by canonicals) count normalized by all reads |
| ... | ... | ... |
Dependencies
- Python >=3.6
- pysam, numpy, pandas, plotly, PyPDF2
- For static image export: kaleido (requires chrome/chromium)
- Docker/Apptainer (optional)
Install all dependencies:
pip install -r requirements.txt
Container Usage
Docker (recommended):
Build locally:
docker build -t telomerehunter2 .
docker run --rm -it -v /data:/data telomerehunter2 telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt
Pull from Docker Hub:
docker pull fpopp22/telomerehunter2
Run from Docker Hub:
docker run --rm -it -v /data:/data fpopp22/telomerehunter2 telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt
Apptainer/Singularity:
Build locally:
apptainer build telomerehunter2.sif Apptainer_TH2.def
# mount data needed
apptainer run telomerehunter2.sif telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt
Pull from Docker Hub (as Apptainer image):
apptainer pull docker://fpopp22/telomerehunter2:latest
apptainer run telomerehunter2_latest.sif telomerehunter2 ...
Troubleshooting
- Memory errors: Use more RAM or limit cores used with
-cflag. - Missing dependencies: Check
requirements.txt. - Banding file missing: Needs reference genome banding file
-botherwise analysis will run without reads mapped to subtelomeres. - Plotting: Try disabling with
--plotNoneor use plotting only mode with--plotNone. - Minor changes to TH1: Skipping the tvrs normalization per 100 bp, improved detection of GXXGGG TVRs, read lengths are estimated from first 1000 reads, added TRPM
For help: GitHub Issues or our FAQ.
Documentation & Resources
- Docs (MkDocs / GitHub Pages): https://ferdinand-popp.github.io/telomerehunter2/
- Telomerehunter Website
- Original TelomereHunter Paper
- For subsequent telomere insertion analysis please check out TelomereRepeatLoci
Tutorial
- Simple run with 1000genomes WGS data:
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.unmapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
telomerehunter2 -p test20vsunmapped -ibt HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam -ibc HG00096.unmapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam -o /results/
- scATAC run with 10x genomics test data:
wget https://cf.10xgenomics.com/samples/cell-atac/2.1.0/10k_pbmc_ATACv2_nextgem_Chromium_Controller/10k_pbmc_ATACv2_nextgem_Chromium_Controller_possorted_bam.bam
wget https://cf.10xgenomics.com/samples/cell-atac/2.1.0/10k_pbmc_ATACv2_nextgem_Chromium_Controller/10k_pbmc_ATACv2_nextgem_Chromium_Controller_possorted_bam.bam.bai
telomerehunter2_sc -p scATAC_test -ibt 10k_pbmc_ATACv2_nextgem_Chromium_Controller_possorted_bam.bam -o /results/ -b telomerehunter2/cytoband_files/hg19_cytoBand.txt --min-reads-per-barcode 30000
# for generating the celltype resolution plot please generate annotation following the exemplary script at [single-cell_annotations.md](docs/single-cell_annotations.md).
Citation
If you use TelomereHunter2, please cite:
- Feuerbach, L., et al. "TelomereHunter – in silico estimation of telomere content and composition from cancer genomes." BMC Bioinformatics 20, 272 (2019). https://doi.org/10.1186/s12859-019-2851-0
- Application Note for TH2 (in preparation).
Contributing
Fork, branch, and submit pull requests. Please add tests and follow code style. For major changes, open an issue first.
License
GNU General Public License v3.0. See LICENSE.
Contact
- Ferdinand Popp (f.popp@dkfz.de)
- Lars Feuerbach (l.feuerbach@dkfz.de)
Acknowledgements
Developed by Ferdinand Popp, Lina Sieverling, Philip Ginsbach, Lars Feuerbach. Supported by German Cancer Research Center (DKFZ) - Division Applied Bioinformatics.
Copyright 2025 Ferdinand Popp, Lina Sieverling, Philip Ginsbach, Lars Feuerbach
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file telomerehunter2-1.0.9.tar.gz.
File metadata
- Download URL: telomerehunter2-1.0.9.tar.gz
- Upload date:
- Size: 128.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9385e71ed3047486562dd581d976ff629277cfcc13d255a373fd5f7a360ecd9b
|
|
| MD5 |
4ffef75d23468b05f77fd25a8c15eb0e
|
|
| BLAKE2b-256 |
af2011dacde984fd11072c1cfef602f94071e5a36a923a22e1488af43ba19cd9
|
File details
Details for the file telomerehunter2-1.0.9-py3-none-any.whl.
File metadata
- Download URL: telomerehunter2-1.0.9-py3-none-any.whl
- Upload date:
- Size: 119.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d74733134fe12620a399fe27a0a230fa5553b72b14ab9991cd57e2592554380e
|
|
| MD5 |
f9f44d547175539e026ca40a311fc9a5
|
|
| BLAKE2b-256 |
15542986b38757c65117157f337c2511e36683246d3966b837598fd1394f92fa
|