Transcription Run-On Grants Detection Of Regulatory elements (TROGDOR)
Project description
TROGDOR
Transcription Run-On Grants Detection Of Regulatory elements (TROGDOR).
https://www.youtube.com/watch?v=90X5NJleYJQ
TROGDOR identifies transcription initiation regions (TIRs) from stranded nascent RNA sequencing data (GRO-seq, PRO-seq, ChRO-seq, mNET-seq, etc.). It uses a 1D U-Net model and a tiled image segmentation approach to achieve SOTA performance at predicting TIRs while maintaining computational efficiency.
Installation
We recommend installing inside an isolated Python environment. With uv (fastest):
uv tool install trogdor
Or with pip inside a conda/venv environment:
pip install trogdor
Or install the latest development version directly from GitHub:
pip install git+https://github.com/adamyhe/TROGDOR.git
# or with uv:
uv tool install git+https://github.com/adamyhe/TROGDOR.git
Usage
Quick start
Run the full pipeline with a single command:
trogdor pipeline -p plus.bw -m minus.bw -o mysample.peaks.bed.gz
This writes one output file:
| File | Description |
|---|---|
mysample.peaks.bed.gz |
Called TIR peak regions (bgzipped BED) |
The intermediate probability bigWig is written to a temporary file and deleted automatically.
Inputs: plus- and minus-strand bigWig files from a nascent RNA sequencing experiment. TROGDOR was trained on PRO/GRO-seq data, and has been vetted on data from GRO/PRO/ChRO/mNET-seq experiments. These files should represent coverage tracks of the 3' ends of reads/fragments (that is, the most recent nucleotide added by the polymerase), ideally in raw counts. The minus strand data can be stored as either positive or negative.
GPU: scoring uses a 1D U-Net model implemented in plain PyTorch and thus can be greatly accelerated by running on a CUDA-capable GPU (particularly Ampere or newer architectures that support bf16). Apple Silicon MPS (-d mps) should also work but has not been tested. Pass -d cpu to run on CPU (much slower). If CUDA is unavailable, the tool automatically falls back to MPS (if detected) or CPU. Inference uses a streaming pipeline: bigWig IO for the next chromosome runs in a background thread while the GPU processes the current one, and chunks are fed to the GPU via a DataLoader with pin_memory for async CPU→GPU transfer.
Pretrained model: downloaded automatically from HuggingFace Hub on first run and cached locally. To use a custom model, pass -M /path/to/model.torch.
Key options
| Flag | Default | Description |
|---|---|---|
-d / --device |
cuda |
PyTorch device (cuda, cpu, cuda:1, …) |
-s / --min_score |
0.95 |
Minimum score threshold; bins below this are not reported |
-b / --save_bigwig |
off | Save the intermediate probability bigWig to this path (pipeline only) |
--chroms |
all | Score only specific chromosomes (e.g. --chroms chr1 chr2) |
--num_workers |
0 |
DataLoader workers for chunk preprocessing (set to 1–4 on Linux/CUDA) |
-v / --verbose |
off | Print progress messages |
Running steps separately
The pipeline can also be run as two separate steps — useful if you want to call peaks at multiple thresholds without re-scoring:
# Step 1: score (GPU recommended)
trogdor score -p plus.bw -m minus.bw -o mysample.prob0.9.bw -s 0.9
# Step 2: call peaks at different thresholds (CPU, fast)
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.9.bed.gz -s 0.9
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.95.bed.gz -s 0.95
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.99.bed.gz -s 0.99
The default peaks command preserves the original threshold-and-merge caller.
An experimental refined caller is available for benchmarking post-processing
choices without changing the model or default behavior:
trogdor peaks -i mysample.prob.bw -o mysample.refined.bed \
--mode refined --min_score 0.95 --max_gap 32 --min_width 32
Refined output includes BED columns for the merged peak and the max-score
summit bin: chrom, start, end, score, summit_start, summit_end,
summit_score. --min_support_signal can optionally require raw plus/minus
coverage support when --support_plus_bigwig and --support_minus_bigwig are
provided.
Empirical FDR estimation and min_score calibration
The fdr subcommand estimates the score threshold corresponding to a target empirical FDR from a probability bigWig and a ground truth peak set (e.g. ENCODE PLS/ELS or PRO-cap peaks for your cell type of interest). This can be useful for deciding what min_score threshold you should use (although the default 0.95 has worked well for me).
# Step 1: generate a dense score bigWig (report ALL values)
trogdor score -p plus.bw -m minus.bw -o mysample.prob.bw --min_score 0
# Step 2: calculate empirical FDR against a candidate set of "ground truth" peaks
trogdor fdr -b mysample.prob.bw -t candidate_peaks.bed.gz --fdr_target 0.05
Strategy: each candidate peak is summarised by its max (or mean) bigWig score. A null distribution is built by shuffling peak positions uniformly within chromosome bounds (preserving widths). FDR at threshold t is estimated as min(1, N_null(t) / N_real(t)), averaged over --n_shuffle independent shuffles. The score threshold at the target FDR is printed to stdout.
| Flag | Default | Description |
|---|---|---|
-b / --bigwig |
— | Probability bigWig (required) |
-t / --peaks |
— | Candidate peak BED (required) |
--stat |
max |
Summary statistic per peak (max or mean) |
--n_shuffle |
1 |
Independent genome shuffles to average the null over |
--fdr_target |
0.05 |
Target FDR for reporting the score threshold |
--output |
off | Write TSV table of threshold/FDR/N_real/N_null to path |
--figure |
off | Save FDR-vs-threshold plot to path |
--chroms |
all | Restrict to specific chromosomes |
Development/Model retraining
Install needed UCSC tools:
mamba create trogdor
mamba activate trogdor
mamba install -c bioconda ucsc-liftover
Install TROGDOR with dev dependencies:
git clone git@github.com:adamyhe/TROGDOR.git
cd TROGDOR
pip install -e ".[dev]"
Training
Most users do not need to retrain — a pre-trained model is bundled with the
package and used automatically by the CLI. See scripts/README.md
for data download, training, and benchmarking instructions of the original TROGDOR model.
I haven't included general scripts for retraining on custom datasets, but these should
be a useful starting point.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trogdor-0.1.3.tar.gz.
File metadata
- Download URL: trogdor-0.1.3.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba075b8085db696e822361d7ba276cb2427f3fb91272096c317aa86a1141d6de
|
|
| MD5 |
3cb02a7b0312b4d4dda56a84d6913046
|
|
| BLAKE2b-256 |
6a5b7a476926abfc090d634668838a7c545563668c50e7ce168e8dd5db010a83
|
Provenance
The following attestation bundles were made for trogdor-0.1.3.tar.gz:
Publisher:
python-package.yml on adamyhe/TROGDOR
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trogdor-0.1.3.tar.gz -
Subject digest:
ba075b8085db696e822361d7ba276cb2427f3fb91272096c317aa86a1141d6de - Sigstore transparency entry: 1587856416
- Sigstore integration time:
-
Permalink:
adamyhe/TROGDOR@8d81c5172840a2e33f2513bcfc050df8e661ac22 -
Branch / Tag:
refs/tags/0.1.3 - Owner: https://github.com/adamyhe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-package.yml@8d81c5172840a2e33f2513bcfc050df8e661ac22 -
Trigger Event:
release
-
Statement type:
File details
Details for the file trogdor-0.1.3-py3-none-any.whl.
File metadata
- Download URL: trogdor-0.1.3-py3-none-any.whl
- Upload date:
- Size: 36.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b82e740a7242e8f4ef0041c0ef48f04086346e05eccfc6ba8d0235cce4f6c89
|
|
| MD5 |
30b17449e4ecfe3c0f9463ad15ddd727
|
|
| BLAKE2b-256 |
755dfe36c3813bcd12e4842e9359021a7a9cc400fe2bcfcaa6fb0cdebc465089
|
Provenance
The following attestation bundles were made for trogdor-0.1.3-py3-none-any.whl:
Publisher:
python-package.yml on adamyhe/TROGDOR
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trogdor-0.1.3-py3-none-any.whl -
Subject digest:
4b82e740a7242e8f4ef0041c0ef48f04086346e05eccfc6ba8d0235cce4f6c89 - Sigstore transparency entry: 1587856507
- Sigstore integration time:
-
Permalink:
adamyhe/TROGDOR@8d81c5172840a2e33f2513bcfc050df8e661ac22 -
Branch / Tag:
refs/tags/0.1.3 - Owner: https://github.com/adamyhe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-package.yml@8d81c5172840a2e33f2513bcfc050df8e661ac22 -
Trigger Event:
release
-
Statement type: