Skip to main content

Transcription Run-On Grants Detection Of Regulatory elements (TROGDOR)

Project description

TROGDOR

PyPI Tests Weights PyPI Downloads

Transcription Run-On Grants Detection Of Regulatory elements (TROGDOR).

https://www.youtube.com/watch?v=90X5NJleYJQ

TROGDOR identifies transcription initiation regions (TIRs) from stranded nascent RNA sequencing data (GRO-seq, PRO-seq, ChRO-seq, mNET-seq, etc.). It uses a 1D U-Net model and a tiled image segmentation approach to achieve SOTA performance at predicting TIRs while maintaining computational efficiency.

Installation

We recommend installing inside an isolated Python environment. With uv (fastest):

uv tool install trogdor

Or with pip inside a conda/venv environment:

pip install trogdor

Or install the latest development version directly from GitHub:

pip install git+https://github.com/adamyhe/TROGDOR.git
# or with uv:
uv tool install git+https://github.com/adamyhe/TROGDOR.git

Usage

Quick start

Run the full pipeline with a single command:

trogdor pipeline -p plus.bw -m minus.bw -o mysample.peaks.bed.gz

This writes one output file:

File Description
mysample.peaks.bed.gz Called TIR peak regions (bgzipped BED)

The intermediate probability bigWig is written to a temporary file and deleted automatically.

Inputs: plus- and minus-strand bigWig files from a nascent RNA sequencing experiment. TROGDOR was trained on PRO/GRO-seq data, and has been vetted on data from GRO/PRO/ChRO/mNET-seq experiments. These files should represent coverage tracks of the 3' ends of reads/fragments (that is, the most recent nucleotide added by the polymerase), ideally in raw counts. The minus strand data can be stored as either positive or negative.

GPU: scoring uses a 1D U-Net model implemented in plain PyTorch and thus can be greatly accelerated by running on a CUDA-capable GPU (particularly Ampere or newer architectures that support bf16). Apple Silicon MPS (-d mps) should also work but has not been tested. Pass -d cpu to run on CPU (much slower). If CUDA is unavailable, the tool automatically falls back to MPS (if detected) or CPU. Inference uses a streaming pipeline: bigWig IO for the next chromosome runs in a background thread while the GPU processes the current one, and chunks are fed to the GPU via a DataLoader with pin_memory for async CPU→GPU transfer.

Pretrained model: downloaded automatically from HuggingFace Hub on first run and cached locally. To use a custom model, pass -M /path/to/model.torch.

Key options

Flag Default Description
-d / --device cuda PyTorch device (cuda, cpu, cuda:1, …)
-s / --min_score 0.95 Minimum score threshold; bins below this are not reported
-b / --save_bigwig off Save the intermediate probability bigWig to this path (pipeline only)
--chroms all Score only specific chromosomes (e.g. --chroms chr1 chr2)
--num_workers 0 DataLoader workers for chunk preprocessing (set to 1–4 on Linux/CUDA)
-v / --verbose off Print progress messages

Running steps separately

The pipeline can also be run as two separate steps — useful if you want to call peaks at multiple thresholds without re-scoring:

# Step 1: score (GPU recommended)
trogdor score -p plus.bw -m minus.bw -o mysample.prob0.9.bw -s 0.9

# Step 2: call peaks at different thresholds (CPU, fast)
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.9.bed.gz -s 0.9
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.95.bed.gz -s 0.95
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.99.bed.gz -s 0.99

The default peaks command preserves the original threshold-and-merge caller. An experimental refined caller is available for benchmarking post-processing choices without changing the model or default behavior:

trogdor peaks -i mysample.prob.bw -o mysample.refined.bed \
  --mode refined --min_score 0.95 --max_gap 32 --min_width 32

Refined output includes BED columns for the merged peak and the max-score summit bin: chrom, start, end, score, summit_start, summit_end, summit_score. --min_support_signal can optionally require raw plus/minus coverage support when --support_plus_bigwig and --support_minus_bigwig are provided.

Empirical FDR estimation and min_score calibration

The fdr subcommand estimates the score threshold corresponding to a target empirical FDR from a probability bigWig and a ground truth peak set (e.g. ENCODE PLS/ELS or PRO-cap peaks for your cell type of interest). This can be useful for deciding what min_score threshold you should use (although the default 0.95 has worked well for me).

# Step 1: generate a dense score bigWig (report ALL values)
trogdor score -p plus.bw -m minus.bw -o mysample.prob.bw --min_score 0
# Step 2: calculate empirical FDR against a candidate set of "ground truth" peaks
trogdor fdr -b mysample.prob.bw -t candidate_peaks.bed.gz --fdr_target 0.05

Strategy: each candidate peak is summarised by its max (or mean) bigWig score. A null distribution is built by shuffling peak positions uniformly within chromosome bounds (preserving widths). FDR at threshold t is estimated as min(1, N_null(t) / N_real(t)), averaged over --n_shuffle independent shuffles. The score threshold at the target FDR is printed to stdout.

Flag Default Description
-b / --bigwig Probability bigWig (required)
-t / --peaks Candidate peak BED (required)
--stat max Summary statistic per peak (max or mean)
--n_shuffle 1 Independent genome shuffles to average the null over
--fdr_target 0.05 Target FDR for reporting the score threshold
--output off Write TSV table of threshold/FDR/N_real/N_null to path
--figure off Save FDR-vs-threshold plot to path
--chroms all Restrict to specific chromosomes

Development/Model retraining

Install needed UCSC tools:

mamba create trogdor
mamba activate trogdor
mamba install -c bioconda ucsc-liftover

Install TROGDOR with dev dependencies:

git clone git@github.com:adamyhe/TROGDOR.git
cd TROGDOR
pip install -e ".[dev]"

Training

Most users do not need to retrain — a pre-trained model is bundled with the package and used automatically by the CLI. See scripts/README.md for data download, training, and benchmarking instructions of the original TROGDOR model. I haven't included general scripts for retraining on custom datasets, but these should be a useful starting point.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trogdor-0.1.3.tar.gz (44.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trogdor-0.1.3-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file trogdor-0.1.3.tar.gz.

File metadata

  • Download URL: trogdor-0.1.3.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trogdor-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ba075b8085db696e822361d7ba276cb2427f3fb91272096c317aa86a1141d6de
MD5 3cb02a7b0312b4d4dda56a84d6913046
BLAKE2b-256 6a5b7a476926abfc090d634668838a7c545563668c50e7ce168e8dd5db010a83

See more details on using hashes here.

Provenance

The following attestation bundles were made for trogdor-0.1.3.tar.gz:

Publisher: python-package.yml on adamyhe/TROGDOR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trogdor-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: trogdor-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trogdor-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4b82e740a7242e8f4ef0041c0ef48f04086346e05eccfc6ba8d0235cce4f6c89
MD5 30b17449e4ecfe3c0f9463ad15ddd727
BLAKE2b-256 755dfe36c3813bcd12e4842e9359021a7a9cc400fe2bcfcaa6fb0cdebc465089

See more details on using hashes here.

Provenance

The following attestation bundles were made for trogdor-0.1.3-py3-none-any.whl:

Publisher: python-package.yml on adamyhe/TROGDOR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page