A Deep Learning Framework for TCR-Peptide Recognition Prediction
Project description
PepTCR-Net: Deep Learning for TCR-Peptide Recognition Prediction
PepTCR-Net predicts T-cell receptor (TCR) recognition of peptide antigens using deep learning with uncertainty quantification.
Quick Start
pip install -U peptcrnet
peptcrnet-download-models
peptcrnet-demo
For notebooks: pip install peptcrnet[notebooks]
Requirements: Python 3.8–3.12 (not 3.13).
Installation
From PyPI (recommended)
pip install -U peptcrnet
peptcrnet-download-models # downloads ~283 MB from Zenodo to ~/.peptcrnet/
From source
git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCRNet
pip install -e ".[notebooks]"
peptcrnet-download-models
Pretrained model checkpoints (required for prediction)
Checkpoints are not included in the pip package. Download them once from Zenodo:
- Checkpoints (v2): https://doi.org/10.5281/zenodo.14194846
- Training CSV data (v1): https://doi.org/10.5281/zenodo.14194728
- Paper: https://doi.org/10.1093/bib/bbaf351
Automatic (recommended):
peptcrnet-download-models
Manual:
curl -LO "https://zenodo.org/records/14194846/files/peptcrnet-pretrained-checkpoints-v1.zip?download=1"
unzip peptcrnet-pretrained-checkpoints-v1.zip -d ~/.peptcrnet/
Files are cached under ~/.peptcrnet/checkpoints/ and ~/.peptcrnet/datasets/atchley.txt.
If a previous download failed, delete the bad file first:
rm -f ~/.peptcrnet/peptcrnet-pretrained-checkpoints-v1.zip
rm -rf ~/.peptcrnet/checkpoints ~/.peptcrnet/datasets
Basic Usage
How prediction works (important)
The pretrained model is a fixed multi-class classifier: it assigns each TCR to the single best-matching peptide from a small set of known peptides (the most frequent in-distribution peptides). It does not score arbitrary peptides — to predict peptides outside this set you must retrain (see Train on your own peptides).
The default vocabulary is top-5:
YVLDHLIVV, NLVPMVATV, GILGFVFTL, GLCTLVAML, KLGGALQAK
Use top_k=10 (or 15/20) for larger vocabularies (requires the matching
top-10_case-16.h5 checkpoint).
One-line prediction
from peptcrnet import quick_predict
results = quick_predict(
tcr_sequences=["CASSLAPGATNEKLFF", "CASSLKPSYNEQFF"],
mhc_alleles=["HLA-A*02:01", "HLA-A*02:01"],
v_genes=["TRBV19", "TRBV7-9"],
j_genes=["TRBJ1-4", "TRBJ2-3"],
scenario=16,
)
print(results) # predicted_peptide column now shows real sequences, e.g. NLVPMVATV
Predict against your own list of peptides
Restrict the prediction to peptides of interest (must be within the model's vocabulary). Each TCR is assigned its best match from your list:
results = quick_predict(
tcr_sequences=["CASSLAPGATNEKLFF", "CASSLKPSYNEQFF"],
mhc_alleles=["HLA-A*02:01", "HLA-A*02:01"],
v_genes=["TRBV19", "TRBV7-9"],
j_genes=["TRBJ1-4", "TRBJ2-3"],
scenario=16,
candidate_peptides=["GILGFVFTL", "NLVPMVATV", "KLGGALQAK"],
)
Predict from CSV
from peptcrnet import predict_from_file
results = predict_from_file("my_data.csv", scenario=16)
results.to_csv("predictions.csv", index=False)
See USAGE_EXAMPLES.md and documentation for more.
Train on your own peptides (custom model)
To predict peptides outside the built-in vocabulary, train a new classifier on
your own labeled data (columns: CDR3, Peptide, and MHC/V/J for HLA/VJ
scenarios):
import pandas as pd
from peptcrnet import PepTCRNetTrainer
train_df = pd.read_csv("my_training_data.csv")
trainer = PepTCRNetTrainer(scenario=16)
trainer.fit(
train_df,
num_peptides=None, # None = use all peptides in your data
epochs=100,
output_checkpoint="my_model.h5",
output_labels="my_labels.json",
)
# Predict with your trained model
predictor = trainer.to_predictor()
results = predictor.predict_with_uncertainty(test_df)
# ...or load it later
from peptcrnet import PepTCRNetPredictor
predictor = PepTCRNetPredictor(
scenario=16, checkpoint="my_model.h5", labels="my_labels.json",
)
Data Format
Input CSV for predict_from_file and the predictor API:
| Column | Required | Description | Example |
|---|---|---|---|
CDR3 |
Yes | TCR CDR3β sequence | CASSRGQGNEQFF |
MHC |
Scenario-dependent | HLA allele (single column) | HLA-A*02:01 |
V |
Scenario-dependent | V gene segment | TRBV7-2 |
J |
Scenario-dependent | J gene segment | TRBJ2-1 |
Peptide |
Optional | True peptide (evaluation only) | GILGFVFTL |
Note: The prediction API uses a single MHC column. Some training notebooks split HLA into HLA-A, HLA-B, HLA-C; merge to MHC for prediction or use the Zenodo CSV format.
Default scenario 16 uses ED + HLA + VJ features — provide CDR3, MHC, V, and J.
Demo notebook (source install)
pip install peptcrnet[notebooks]
jupyter notebook DEMO_Complete_Pipeline.ipynb
Citation
@article{le2025peptcrnet,
title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
journal={Briefings in Bioinformatics},
volume={26},
number={4},
pages={bbaf351},
year={2025},
doi={10.1093/bib/bbaf351}
}
License
MIT — see LICENSE.
Contact
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peptcrnet-1.2.0.tar.gz.
File metadata
- Download URL: peptcrnet-1.2.0.tar.gz
- Upload date:
- Size: 976.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12638f92a175f438133040cb354dba13f8527554be8927b0baf8f0983d0b176d
|
|
| MD5 |
75bba37afab362685f06e2579ef38875
|
|
| BLAKE2b-256 |
9dffe0eba133d269359fd2da25affda5b66ea3bb83f4de2c30ab70521d7f0e95
|
File details
Details for the file peptcrnet-1.2.0-py3-none-any.whl.
File metadata
- Download URL: peptcrnet-1.2.0-py3-none-any.whl
- Upload date:
- Size: 242.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bafaf640a9517b04cec5d2198d7b5f6c1632b68bdc9156f0a4ee0954fd92d768
|
|
| MD5 |
18fa417a9b95cc460ae7397f1f0e3df3
|
|
| BLAKE2b-256 |
92a28b7401bdd4f2fb3772d470d1d4bcc8bc9010258abd1cf39e8723832467cd
|