Skip to main content

Analyzer for TCR-pMHC binding predictor outputs

Project description

tcr-pmhc-analyzer

License: MIT Python 3.10+ PyPI version

Analyzer for TCR-pMHC binding predictor outputs. Merges predictions from multiple models into a unified table, detects data leakage against bundled training sets, identifies seen/unseen peptides, and benchmarks model performance via ROC curves.

Installation

pip install tcr-pmhc-analyzer

For development:

git clone https://github.com/qbic-pipelines/tcr-pmhc-analyzer.git
cd tcr-pmhc-analyzer
pip install -e ".[dev]"

Input format

Both commands accept a TSV configuration file with two required columns:

Column Description
model Name of the prediction model
file Path to the model's prediction output

Example input.tsv:

model	file
ergo2	results/ergo2_predictions.csv
mixtcrpred	results/mixtcrpred_predictions.csv
t2pmhc-gcn	results/t2pmhc_gcn_predictions.csv

Each prediction file must contain the following columns:

Column Description
identifier Unique sample identifier (used for merging)
binding_score Model's predicted binding score
binder Ground truth label (0/1), required for benchmarking
peptide Peptide sequence
cdr3a CDR3 alpha chain sequence
cdr3b CDR3 beta chain sequence
va, vb V gene alpha/beta
ja, jb J gene alpha/beta
mhc MHC allele
organism Source organism
mhc_class MHC class

Commands

create-analyzer-table

Merges predictions from multiple models into a single table with rank-normalized scores, data leakage annotations, and seen-peptide flags.

tcr-pmhc-analyzer create-analyzer-table [OPTIONS]
Option Short Required Description
--input PATH -i Yes Path to TSV config file with model and file columns
--output PATH -o Yes Output file path (.csv or .tsv)
--ergo-version If ergo2 ERGO training data version: vdjdb or mcpas

Example:

tcr-pmhc-analyzer create-analyzer-table \
  -i input.tsv \
  -o analyzer_table.csv \
  --ergo-version vdjdb

Output columns added:

  • binding_score_{model} — raw binding score per model
  • rank_score_{model} — rank-normalized score in [0, 1] (1 = highest)
  • sample_in_train_{model}True if the sample appears in the model's training data (data leakage)
  • seen_in_{model}True if the peptide was seen in the model's training data

benchmark

Generates ROC curve plots comparing model performance, split by seen vs unseen peptides. Data leakage samples are automatically removed before analysis.

tcr-pmhc-analyzer benchmark [OPTIONS]
Option Short Required Description
--input PATH -i * Path to TSV config file with model and file columns
--table PATH * Path to a pre-created analyzer table (alternative to --input)
--output PATH -o Yes Output directory for ROC curve plots
--ergo-version If ergo2 ERGO training data version: vdjdb or mcpas
--models -m No Space-separated list of models to benchmark (default: all available)

* Either --input or --table must be provided.

Examples:

# Benchmark from raw predictions
tcr-pmhc-analyzer benchmark -i input.tsv -o results/

# Benchmark from a pre-created analyzer table
tcr-pmhc-analyzer benchmark --table analyzer_table.csv -o results/

# Benchmark specific models only
tcr-pmhc-analyzer benchmark -i input.tsv -o results/ -m "ergo2 mixtcrpred tabr-bert"

Output files:

  • roc_curve_unseen.png — ROC curves for peptides unseen by all selected models
  • roc_curve_seen.png — ROC curves for peptides seen by all selected models

Supported models

Model Training data
ergo2 mcpas or vdjdb (specify with --ergo-version)
mixtcrpred 146 pMHC training set
t2pmhc-gcn t2pmhc core training set
t2pmhc-gat t2pmhc core training set
tabr-bert TCR-pMHC training set
tulip-tcr TULIP training set
atm-tcr ATM-TCR training set

How it works

  1. Merge: Prediction outputs from multiple models are merged on the identifier column into a single DataFrame.
  2. Rank normalization: Each model's binding_score is rank-normalized to [0, 1] using descending order with average tie-breaking. NaN values are preserved.
  3. Data leakage detection: Each sample is checked against bundled training data to flag samples that appear in a model's training set.
  4. Seen peptide detection: Each peptide is checked against training data to identify whether it was seen during model training.
  5. Benchmarking: ROC curves are generated after removing leaked samples, separately for seen and unseen peptides.

Citations

If you use tcr-pmhc-analyzer in your research, please cite the underlying prediction models:

ATM-TCR

Cai, M. et al. (2022). ATM-TCR: TCR-Epitope Binding Affinity Prediction Using a Multi-Head Self-Attention Model. Frontiers in Immunology, 13, 893247. https://doi.org/10.3389/fimmu.2022.893247

ERGO-II

Springer, I. et al. (2021). Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction. Frontiers in Immunology, 12, 664514. https://doi.org/10.3389/fimmu.2021.664514

MIXTCRpred

Croce, G. et al. (2024). Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells. Nature Communications, 15, 3211. https://doi.org/10.1038/s41467-024-47461-8

t2pmhc

Polster, M. et al. (2026). t2pmhc: A Structure-Informed Graph Neural Network to Predict TCR-pMHC Binding. bioRxiv. https://doi.org/10.64898/2026.02.27.708137

TABR-BERT

Zhang, J. et al. (2024). Accurate TCR-pMHC interaction prediction using a BERT-based transfer learning method. Briefings in Bioinformatics, 25(1), bbad436. https://doi.org/10.1093/bib/bbad436

TULIP

Meynard-Piganeau, B. et al. (2024). TULIP — a Transformer-based Unsupervised Language model for Interacting Peptides and T-cell receptors that generalizes to unseen epitopes. Proceedings of the National Academy of Sciences, 121(13). https://doi.org/10.1073/pnas.2316401121

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcr_pmhc_analyzer-0.1.1.tar.gz (12.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tcr_pmhc_analyzer-0.1.1-py3-none-any.whl (13.1 MB view details)

Uploaded Python 3

File details

Details for the file tcr_pmhc_analyzer-0.1.1.tar.gz.

File metadata

  • Download URL: tcr_pmhc_analyzer-0.1.1.tar.gz
  • Upload date:
  • Size: 12.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for tcr_pmhc_analyzer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5cce0952ffb63e4ec19111532268450cbd84bd5c53ba2def72f197fff25877ef
MD5 49c3f9df0501e80693bf18c7bfbed135
BLAKE2b-256 d28f809db7680d72c28469d0b6e1a54e3976484edf4d13eaaee4fc604816bf53

See more details on using hashes here.

File details

Details for the file tcr_pmhc_analyzer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tcr_pmhc_analyzer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82a811d2829bc9287316012d679653672fb36d0dba71e74cee4682a913c597a2
MD5 9cfa614840a6cee7c30ec4908cbaf2f3
BLAKE2b-256 09d54e10cd63bf90aa55771fe7ae4b40dfc6a8877f51f2f5cecabde6b126a12e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page