An advanced version of TelomereHunter for Python 3, with new features
Project description
TelomereHunter2
TelomereHunter2 is a Python-based tool for estimating telomere content and analyzing telomeric variant repeats (TVRs) from genome sequencing data. It supports BAM/CRAM files, flexible telomere repeat and reference genome inputs, and provides outputs for bulk and single-cell genome sequencing data.
Release Notes
See RELEASE_NOTES.md for the latest changes and version history.
New Features
- Fast, container-friendly Python 3 implementation
- Parallelization and algorithmic steps for drastic speedup
- Supports BAM/CRAM, custom telomeric repeats, and now also non-human genomes
- Static and interactive HTML reports (Plotly)
- Docker and Apptainer/Singularity containers
- Single cell sequencing support (e.g. scATAC-seq; barcode splitting and per-cell analysis)
- Robust input handling and exception management
- Fast mode for quick overview of unmapped reads
Installation
Classic setup:
pip install telomerehunter2
From source:
# With pip:
git clone https://github.com/ferdinand-popp/telomerehunter2.git
cd telomerehunter2
python -m venv venv
source venv/bin/activate
pip install -e . --no-cache-dir
# With uv:
git clone https://github.com/ferdinand-popp/telomerehunter2.git
cd telomerehunter2
uv pip install -e . --no-cache-dir
Container usage:
See Container Usage for Docker/Apptainer instructions.
Operating systems:
Currently tested on Linux and macOS. Windows support via WSL2 and Docker not completely tested (WIP check GitHub Issues)
Usage Bulk vs single cell Analysis
Bulk Analysis
telomerehunter2 -ibt TUMOR_FILE -ibc CONTROL_FILE -o OUTPUT_DIRECTORY -p ID_OF_SAMPLE -b BANDING_FILE [options]
- Single sample:
telomerehunter2 -ibt sample.bam -o results/ -p SampleID -b telomerehunter2/cytoband_files/hg19_cytoBand.txt - Tumor vs Control:
telomerehunter2 -ibt sample.bam -ibc control.bam -o results/ -p PairID -b telomerehunter2/cytoband_files/hg19_cytoBand.txt - Custom repeats/species:
telomerehunter2 ... --repeats TTTAGGG TTAAGGG --repeatsContext TTAAGGG - Fast mode (quick overview of unmapped reads generating summary with overview):
telomerehunter2 -ibt sample.bam -o results/ -p SampleID --fast_mode
Single cell sequencing Analysis
TelomereHunter2 now supports direct single-cell BAM analysis (with CB barcode tag). Simply run:
telomerehunter2_sc -ibt sample.bam -o results/ -p SampleID -b telomerehunter2/cytoband_files/cytoband.txt --min-reads-per-barcode 10000
This will perform barcode-aware telomere analysis and output per-cell results in a summary file. The minimum reads per
barcode threshold can be set with --min-reads-per-barcode. To rerun postprocessing with adjusted --min-reads-per-barcode
threshold run command again with --noFiltering to skip the expensive filtering step from all reads to telomeric reads.
If the reads have a different barcode tag than CB, use --barcodeTag to set the correct one.
More information on correcting chromatin state for scATAC follows in (Engel et al., 2024).
See tests/test_telomerehunter2_sc.py for example usage and validation.
Usage full list of option
telomerehunter2 --help
Input & Output
Input:
- BAM/CRAM files (aligned reads, <-ibt> for tumor, <-ibc> for control)
- Cytoband file (tab-delimited, e.g.
telomerehunter2/cytoband_files/hg19_cytoBand.txt, <-b>) - Identifier for sample/pair (<-p>)
- Optional: custom telomeric repeats
Output:
summary.tsv,TVR_top_contexts.tsv,singletons.tsv- Plots (
plots/directory, PNG/HTML) - Logs (run status/errors)
- For sc-seq: Additionally to the complete bulk run you get per-cell results in sc_summary.tsv and barcode_counts.tsv with reads counts per barcode
Explanation of summary.tsv
| Column | Example | Description |
|---|---|---|
| PID | PATIENT1 | Sample name |
| sample | tumor | Sample classification (tumor (single), control, log2(t/c)) |
| tel_content | 1.8 | Intratelomeric reads / reads in GC correction range * 1e6 |
| total_reads | 120 | Number of reads in the input file |
| read_lengths | 25,36,42,54 | Unique lengths of reads |
| repeat_threshold_set | 6 per 100 bp | Telomeric repeat threshold set |
| repeat_threshold_used | 4 | Repeats threshold applied based on avg. read length |
| intratelomeric_reads | 4 | Filtered Tel reads in unmapped reads |
| junctionspanning_reads | 0 | Filtered Tel reads spanning junctions into first/last band |
| subtelomeric_reads | 6 | Filtered Tel reads in subtelomeric regions (first/last band) |
| intrachromosomal_reads | 0 | Filtered Tel reads in intrachromosomal regions |
| tel_read_count | 10 | Total telomeric reads identified |
| gc_bins_for_correction | 48-52 | GC content range used for normalization of reads |
| total_reads_with_tel_gc | 8 | Total reads within GC bin for normalization |
| TCAGGG_arbitrary_context_norm_by_intratel_reads | 1.5 | Telomeric variant repeat count normalized by intratelomeric reads |
| ... | ... | ... |
| TCAGGG_singletons_norm_by_all_reads | 0.0 | Singleton (TVR flanked by canonicals) count normalized by all reads |
| ... | ... | ... |
Dependencies
- Python >=3.6
- pysam, numpy, pandas, plotly, PyPDF2
- For static image export: kaleido (requires chrome/chromium)
- Docker/Apptainer (optional)
Install all dependencies:
pip install -r requirements.txt
Container Usage
Docker (recommended):
Build locally:
docker build -t telomerehunter2 .
docker run --rm -it -v /data:/data telomerehunter2 telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt
Pull from Docker Hub:
docker pull fpopp22/telomerehunter2
Run from Docker Hub:
docker run --rm -it -v /data:/data fpopp22/telomerehunter2 telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt
Apptainer/Singularity:
Build locally:
apptainer build telomerehunter2.sif Apptainer_TH2.def
# mount data needed
apptainer run telomerehunter2.sif telomerehunter2 -ibt /data/sample.bam -o /data/results -p SampleID -b /data/hg19_cytoBand.txt
Pull from Docker Hub (as Apptainer image):
apptainer pull docker://fpopp22/telomerehunter2:latest
apptainer run telomerehunter2_latest.sif telomerehunter2 ...
Troubleshooting
- Memory errors: Use more RAM or limit cores used with
-cflag. - Missing dependencies: Check
requirements.txt. - Banding file missing: Needs reference genome banding file
-botherwise analysis will run without reads mapped to subtelomeres. - Plotting: Try disabling with
--plotNoneor use plotting only mode with--plotNone. - Minor changes to TH1: Skipping the tvrs normalization per 100 bp, improved detection of GXXGGG TVRs, read lengths are estimated from first 1000 reads, added TRPM
For help: GitHub Issues or our FAQ.
Documentation & Resources
- Docs (MkDocs / GitHub Pages): https://ferdinand-popp.github.io/telomerehunter2/
- GitHub Wiki (optional): https://github.com/ferdinand-popp/telomerehunter2/wiki
- Example Data: https://github.com/ferdinand-popp/telomerehunter2/tree/main/tests
- Telomerehunter Website
- Original TelomereHunter Paper
Citation
If you use TelomereHunter2, please cite:
- Feuerbach, L., et al. "TelomereHunter – in silico estimation of telomere content and composition from cancer genomes." BMC Bioinformatics 20, 272 (2019). https://doi.org/10.1186/s12859-019-2851-0
- Application Note for TH2 (in preparation).
Contributing
Fork, branch, and submit pull requests. Please add tests and follow code style. For major changes, open an issue first.
Before submitting, please install the tox package and run the following checks:
- Run Unit Tests and Style Checks:
tox
License
GNU General Public License v3.0. See LICENSE.
Contact
- Ferdinand Popp (f.popp@dkfz.de)
- Lars Feuerbach (l.feuerbach@dkfz.de)
Acknowledgements
Developed by Ferdinand Popp, Lina Sieverling, Philip Ginsbach, Lars Feuerbach. Supported by German Cancer Research Center (DKFZ) - Division Applied Bioinformatics.
Copyright 2025 Ferdinand Popp, Lina Sieverling, Philip Ginsbach, Lars Feuerbach
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file telomerehunter2-1.0.6.tar.gz.
File metadata
- Download URL: telomerehunter2-1.0.6.tar.gz
- Upload date:
- Size: 127.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a626c3e8e3f933b03bfc30e95ece04385dd449b6f9b375334f8f79b3f9bdb96d
|
|
| MD5 |
61cbc64045da58f9bed4f1def4c03560
|
|
| BLAKE2b-256 |
9cdeb6936f45d3ab3386492ad10d8f369b2ad5ab403bae6e07c8b3878b722686
|
File details
Details for the file telomerehunter2-1.0.6-py3-none-any.whl.
File metadata
- Download URL: telomerehunter2-1.0.6-py3-none-any.whl
- Upload date:
- Size: 118.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ef85eb86b08000b24918e8b3e42d5caad6ed78dbc339a581680d96f54394a63
|
|
| MD5 |
e93f493e259c8915badd10cf901a116f
|
|
| BLAKE2b-256 |
539865c0ade4fb7624db0e10e597bbd938ebc91e6370afcf7cde5624ef0d461a
|