Skip to main content

Analyzer for TCR-pMHC binding predictor outputs

Project description

tcr-pmhc-analyzer

License: MIT Python 3.10+ PyPI version

Analyzer for TCR-pMHC binding predictor outputs. Merges predictions from multiple models into a unified table, detects data leakage against bundled training sets, identifies seen/unseen peptides, and benchmarks model performance via ROC curves.

Installation

pip install tcr-pmhc-analyzer

For development:

git clone https://github.com/qbic-pipelines/tcr-pmhc-analyzer.git
cd tcr-pmhc-analyzer
pip install -e ".[dev]"

Input format

Both commands accept a TSV configuration file with two required columns:

Column Description
model Name of the prediction model
file Path to the model's prediction output

Example input.tsv:

model	file
ergo2	results/ergo2_predictions.csv
mixtcrpred	results/mixtcrpred_predictions.csv
t2pmhc-gcn	results/t2pmhc_gcn_predictions.csv

Each prediction file must contain the following columns:

Column Description
identifier Unique sample identifier (used for merging)
binding_score Model's predicted binding score
binder Ground truth label (0/1), required for benchmarking
peptide Peptide sequence
cdr3a CDR3 alpha chain sequence
cdr3b CDR3 beta chain sequence
va, vb V gene alpha/beta
ja, jb J gene alpha/beta
mhc MHC allele
organism Source organism
mhc_class MHC class

Commands

create-analyzer-table

Merges predictions from multiple models into a single table with rank-normalized scores, data leakage annotations, and seen-peptide flags.

tcr-pmhc-analyzer create-analyzer-table [OPTIONS]
Option Short Required Description
--input PATH -i Yes Path to TSV config file with model and file columns
--output PATH -o Yes Output file path (.csv or .tsv)
--ergo-version If ergo2 ERGO training data version: vdjdb or mcpas

Example:

tcr-pmhc-analyzer create-analyzer-table \
  -i input.tsv \
  -o analyzer_table.csv \
  --ergo-version vdjdb

Output columns added:

  • binding_score_{model} — raw binding score per model
  • rank_score_{model} — rank-normalized score in [0, 1] (1 = highest)
  • sample_in_train_{model}True if the sample appears in the model's training data (data leakage)
  • seen_in_{model}True if the peptide was seen in the model's training data

benchmark

Generates ROC curve plots comparing model performance, split by seen vs unseen peptides. Data leakage samples are automatically removed before analysis.

tcr-pmhc-analyzer benchmark [OPTIONS]
Option Short Required Description
--input PATH -i * Path to TSV config file with model and file columns
--table PATH * Path to a pre-created analyzer table (alternative to --input)
--output PATH -o Yes Output directory for ROC curve plots
--ergo-version If ergo2 ERGO training data version: vdjdb or mcpas
--models -m No Space-separated list of models to benchmark (default: all available)

* Either --input or --table must be provided.

Examples:

# Benchmark from raw predictions
tcr-pmhc-analyzer benchmark -i input.tsv -o results/

# Benchmark from a pre-created analyzer table
tcr-pmhc-analyzer benchmark --table analyzer_table.csv -o results/

# Benchmark specific models only
tcr-pmhc-analyzer benchmark -i input.tsv -o results/ -m "ergo2 mixtcrpred tabr-bert"

Output files:

  • roc_curve_unseen.png — ROC curves for peptides unseen by all selected models
  • roc_curve_seen.png — ROC curves for peptides seen by all selected models

Supported models

Model Training data
ergo2 mcpas or vdjdb (specify with --ergo-version)
mixtcrpred 146 pMHC training set
t2pmhc-gcn t2pmhc core training set
t2pmhc-gat t2pmhc core training set
tabr-bert TCR-pMHC training set
tulip-tcr TULIP training set
atm-tcr ATM-TCR training set

How it works

  1. Merge: Prediction outputs from multiple models are merged on the identifier column into a single DataFrame.
  2. Rank normalization: Each model's binding_score is rank-normalized to [0, 1] using descending order with average tie-breaking. NaN values are preserved.
  3. Data leakage detection: Each sample is checked against bundled training data to flag samples that appear in a model's training set.
  4. Seen peptide detection: Each peptide is checked against training data to identify whether it was seen during model training.
  5. Benchmarking: ROC curves are generated after removing leaked samples, separately for seen and unseen peptides.

Citations

If you use tcr-pmhc-analyzer in your research, please cite the underlying prediction models:

ATM-TCR

Cai, M. et al. (2022). ATM-TCR: TCR-Epitope Binding Affinity Prediction Using a Multi-Head Self-Attention Model. Frontiers in Immunology, 13, 893247. https://doi.org/10.3389/fimmu.2022.893247

ERGO-II

Springer, I. et al. (2021). Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction. Frontiers in Immunology, 12, 664514. https://doi.org/10.3389/fimmu.2021.664514

MIXTCRpred

Croce, G. et al. (2024). Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells. Nature Communications, 15, 3211. https://doi.org/10.1038/s41467-024-47461-8

t2pMHC

Polster, M. et al. (2026). t2pmhc: A Structure-Informed Graph Neural Network to Predict TCR-pMHC Binding. bioRxiv. https://doi.org/10.64898/2026.02.27.708137

TABR-BERT

Zhang, J. et al. (2024). Accurate TCR-pMHC interaction prediction using a BERT-based transfer learning method. Briefings in Bioinformatics, 25(1), bbad436. https://doi.org/10.1093/bib/bbad436

TULIP

Meynard-Piganeau, B. et al. (2024). TULIP — a Transformer-based Unsupervised Language model for Interacting Peptides and T-cell receptors that generalizes to unseen epitopes. Proceedings of the National Academy of Sciences, 121(13). https://doi.org/10.1073/pnas.2316401121

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcr_pmhc_analyzer-0.1.0.tar.gz (12.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tcr_pmhc_analyzer-0.1.0-py3-none-any.whl (13.1 MB view details)

Uploaded Python 3

File details

Details for the file tcr_pmhc_analyzer-0.1.0.tar.gz.

File metadata

  • Download URL: tcr_pmhc_analyzer-0.1.0.tar.gz
  • Upload date:
  • Size: 12.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for tcr_pmhc_analyzer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd4df252347f00c7bd6b4a864e2dc8d9649b1d2560a654842255c5044e1a8aa2
MD5 1d78611a1d6d23b2a9b1c2842fd14e28
BLAKE2b-256 57b8c90dc8c9081d1cf600ce624304e78a8ce7c26a0312279345f2c650cce4bb

See more details on using hashes here.

File details

Details for the file tcr_pmhc_analyzer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tcr_pmhc_analyzer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91d2dad8afdc3cac4a9469323c41126b4f29e964c8e461a1b07fa483a95a8469
MD5 9f6d852eed4abe6544a36e207af16c07
BLAKE2b-256 e0f5de81d86dfa0a1264b94359c4da0ce69a47d38e51c940ad91b30ddbbad502

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page