Analyzer for TCR-pMHC binding predictor outputs
Project description
tcr-pmhc-analyzer
Analyzer for TCR-pMHC binding predictor outputs. Merges predictions from multiple models into a unified table, detects data leakage against bundled training sets, identifies seen/unseen peptides, and benchmarks model performance via ROC curves.
Installation
pip install tcr-pmhc-analyzer
For development:
git clone https://github.com/qbic-pipelines/tcr-pmhc-analyzer.git
cd tcr-pmhc-analyzer
pip install -e ".[dev]"
Input format
Both commands accept a TSV configuration file with two required columns:
| Column | Description |
|---|---|
model |
Name of the prediction model |
file |
Path to the model's prediction output |
Example input.tsv:
model file
ergo2 results/ergo2_predictions.csv
mixtcrpred results/mixtcrpred_predictions.csv
t2pmhc-gcn results/t2pmhc_gcn_predictions.csv
Each prediction file must contain the following columns:
| Column | Description |
|---|---|
identifier |
Unique sample identifier (used for merging) |
binding_score |
Model's predicted binding score |
binder |
Ground truth label (0/1), required for benchmarking |
peptide |
Peptide sequence |
cdr3a |
CDR3 alpha chain sequence |
cdr3b |
CDR3 beta chain sequence |
va, vb |
V gene alpha/beta |
ja, jb |
J gene alpha/beta |
mhc |
MHC allele |
organism |
Source organism |
mhc_class |
MHC class |
Commands
create-analyzer-table
Merges predictions from multiple models into a single table with rank-normalized scores, data leakage annotations, and seen-peptide flags.
tcr-pmhc-analyzer create-analyzer-table [OPTIONS]
| Option | Short | Required | Description |
|---|---|---|---|
--input PATH |
-i |
Yes | Path to TSV config file with model and file columns |
--output PATH |
-o |
Yes | Output file path (.csv or .tsv) |
--ergo-version |
If ergo2 | ERGO training data version: vdjdb or mcpas |
Example:
tcr-pmhc-analyzer create-analyzer-table \
-i input.tsv \
-o analyzer_table.csv \
--ergo-version vdjdb
Output columns added:
binding_score_{model}— raw binding score per modelrank_score_{model}— rank-normalized score in [0, 1] (1 = highest)sample_in_train_{model}—Trueif the sample appears in the model's training data (data leakage)seen_in_{model}—Trueif the peptide was seen in the model's training data
benchmark
Generates ROC curve plots comparing model performance, split by seen vs unseen peptides. Data leakage samples are automatically removed before analysis.
tcr-pmhc-analyzer benchmark [OPTIONS]
| Option | Short | Required | Description |
|---|---|---|---|
--input PATH |
-i |
* | Path to TSV config file with model and file columns |
--table PATH |
* | Path to a pre-created analyzer table (alternative to --input) |
|
--output PATH |
-o |
Yes | Output directory for ROC curve plots |
--ergo-version |
If ergo2 | ERGO training data version: vdjdb or mcpas |
|
--models |
-m |
No | Space-separated list of models to benchmark (default: all available) |
* Either --input or --table must be provided.
Examples:
# Benchmark from raw predictions
tcr-pmhc-analyzer benchmark -i input.tsv -o results/
# Benchmark from a pre-created analyzer table
tcr-pmhc-analyzer benchmark --table analyzer_table.csv -o results/
# Benchmark specific models only
tcr-pmhc-analyzer benchmark -i input.tsv -o results/ -m "ergo2 mixtcrpred tabr-bert"
Output files:
roc_curve_unseen.png— ROC curves for peptides unseen by all selected modelsroc_curve_seen.png— ROC curves for peptides seen by all selected models
Supported models
| Model | Training data |
|---|---|
ergo2 |
mcpas or vdjdb (specify with --ergo-version) |
mixtcrpred |
146 pMHC training set |
t2pmhc-gcn |
t2pmhc core training set |
t2pmhc-gat |
t2pmhc core training set |
tabr-bert |
TCR-pMHC training set |
tulip-tcr |
TULIP training set |
atm-tcr |
ATM-TCR training set |
How it works
- Merge: Prediction outputs from multiple models are merged on the
identifiercolumn into a single DataFrame. - Rank normalization: Each model's
binding_scoreis rank-normalized to [0, 1] using descending order with average tie-breaking. NaN values are preserved. - Data leakage detection: Each sample is checked against bundled training data to flag samples that appear in a model's training set.
- Seen peptide detection: Each peptide is checked against training data to identify whether it was seen during model training.
- Benchmarking: ROC curves are generated after removing leaked samples, separately for seen and unseen peptides.
Citations
If you use tcr-pmhc-analyzer in your research, please cite the underlying prediction models:
ATM-TCR
Cai, M. et al. (2022). ATM-TCR: TCR-Epitope Binding Affinity Prediction Using a Multi-Head Self-Attention Model. Frontiers in Immunology, 13, 893247. https://doi.org/10.3389/fimmu.2022.893247
ERGO-II
Springer, I. et al. (2021). Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction. Frontiers in Immunology, 12, 664514. https://doi.org/10.3389/fimmu.2021.664514
MIXTCRpred
Croce, G. et al. (2024). Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells. Nature Communications, 15, 3211. https://doi.org/10.1038/s41467-024-47461-8
t2pmhc
Polster, M. et al. (2026). t2pmhc: A Structure-Informed Graph Neural Network to Predict TCR-pMHC Binding. bioRxiv. https://doi.org/10.64898/2026.02.27.708137
TABR-BERT
Zhang, J. et al. (2024). Accurate TCR-pMHC interaction prediction using a BERT-based transfer learning method. Briefings in Bioinformatics, 25(1), bbad436. https://doi.org/10.1093/bib/bbad436
TULIP
Meynard-Piganeau, B. et al. (2024). TULIP — a Transformer-based Unsupervised Language model for Interacting Peptides and T-cell receptors that generalizes to unseen epitopes. Proceedings of the National Academy of Sciences, 121(13). https://doi.org/10.1073/pnas.2316401121
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tcr_pmhc_analyzer-0.1.1.tar.gz.
File metadata
- Download URL: tcr_pmhc_analyzer-0.1.1.tar.gz
- Upload date:
- Size: 12.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cce0952ffb63e4ec19111532268450cbd84bd5c53ba2def72f197fff25877ef
|
|
| MD5 |
49c3f9df0501e80693bf18c7bfbed135
|
|
| BLAKE2b-256 |
d28f809db7680d72c28469d0b6e1a54e3976484edf4d13eaaee4fc604816bf53
|
File details
Details for the file tcr_pmhc_analyzer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tcr_pmhc_analyzer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82a811d2829bc9287316012d679653672fb36d0dba71e74cee4682a913c597a2
|
|
| MD5 |
9cfa614840a6cee7c30ec4908cbaf2f3
|
|
| BLAKE2b-256 |
09d54e10cd63bf90aa55771fe7ae4b40dfc6a8877f51f2f5cecabde6b126a12e
|