Skip to main content

A Deep Learning Framework for TCR-Peptide Recognition Prediction

Project description

PepTCR-Net: Deep Learning for TCR-Peptide Recognition Prediction

PyPI version Python 3.8–3.12 License: MIT Checkpoints on Zenodo

PepTCR-Net predicts T-cell receptor (TCR) recognition of peptide antigens using deep learning with uncertainty quantification.

Quick Start

pip install -U peptcrnet
peptcrnet-download-models
peptcrnet-demo

For notebooks: pip install peptcrnet[notebooks]

Requirements: Python 3.8–3.12 (not 3.13).

Installation

From PyPI (recommended)

pip install -U peptcrnet
peptcrnet-download-models   # downloads ~283 MB from Zenodo to ~/.peptcrnet/

From source

git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCRNet
pip install -e ".[notebooks]"
peptcrnet-download-models

Pretrained model checkpoints (required for prediction)

Checkpoints are not included in the pip package. Download them once from Zenodo:

Automatic (recommended):

peptcrnet-download-models

Manual:

curl -LO "https://zenodo.org/records/14194846/files/peptcrnet-pretrained-checkpoints-v1.zip?download=1"
unzip peptcrnet-pretrained-checkpoints-v1.zip -d ~/.peptcrnet/

Files are cached under ~/.peptcrnet/checkpoints/ and ~/.peptcrnet/datasets/atchley.txt.

If a previous download failed, delete the bad file first:

rm -f ~/.peptcrnet/peptcrnet-pretrained-checkpoints-v1.zip
rm -rf ~/.peptcrnet/checkpoints ~/.peptcrnet/datasets

Basic Usage

How prediction works (important)

The pretrained model is a fixed multi-class classifier: it assigns each TCR to the single best-matching peptide from a small set of known peptides (the most frequent in-distribution peptides). It does not score arbitrary peptides — to predict peptides outside this set you must retrain (see Train on your own peptides).

The default vocabulary is top-5:

YVLDHLIVV, NLVPMVATV, GILGFVFTL, GLCTLVAML, KLGGALQAK

Use top_k=10 (or 15/20) for larger vocabularies (requires the matching top-10_case-16.h5 checkpoint).

One-line prediction

from peptcrnet import quick_predict

results = quick_predict(
    tcr_sequences=["CASSLAPGATNEKLFF", "CASSLKPSYNEQFF"],
    mhc_alleles=["HLA-A*02:01", "HLA-A*02:01"],
    v_genes=["TRBV19", "TRBV7-9"],
    j_genes=["TRBJ1-4", "TRBJ2-3"],
    scenario=16,
)
print(results)  # predicted_peptide column now shows real sequences, e.g. NLVPMVATV

Predict against your own list of peptides

Restrict the prediction to peptides of interest (must be within the model's vocabulary). Each TCR is assigned its best match from your list:

results = quick_predict(
    tcr_sequences=["CASSLAPGATNEKLFF", "CASSLKPSYNEQFF"],
    mhc_alleles=["HLA-A*02:01", "HLA-A*02:01"],
    v_genes=["TRBV19", "TRBV7-9"],
    j_genes=["TRBJ1-4", "TRBJ2-3"],
    scenario=16,
    candidate_peptides=["GILGFVFTL", "NLVPMVATV", "KLGGALQAK"],
)

Predict from CSV

from peptcrnet import predict_from_file

results = predict_from_file("my_data.csv", scenario=16)
results.to_csv("predictions.csv", index=False)

See USAGE_EXAMPLES.md and documentation for more.

Train on your own peptides (custom model)

To predict peptides outside the built-in vocabulary, train a new classifier on your own labeled data (columns: CDR3, Peptide, and MHC/V/J for HLA/VJ scenarios):

import pandas as pd
from peptcrnet import PepTCRNetTrainer

train_df = pd.read_csv("my_training_data.csv")
trainer = PepTCRNetTrainer(scenario=16)
trainer.fit(
    train_df,
    num_peptides=None,           # None = use all peptides in your data
    epochs=100,
    output_checkpoint="my_model.h5",
    output_labels="my_labels.json",
)

# Predict with your trained model
predictor = trainer.to_predictor()
results = predictor.predict_with_uncertainty(test_df)

# ...or load it later
from peptcrnet import PepTCRNetPredictor
predictor = PepTCRNetPredictor(
    scenario=16, checkpoint="my_model.h5", labels="my_labels.json",
)

Data Format

Input CSV for predict_from_file and the predictor API:

Column Required Description Example
CDR3 Yes TCR CDR3β sequence CASSRGQGNEQFF
MHC Scenario-dependent HLA allele (single column) HLA-A*02:01
V Scenario-dependent V gene segment TRBV7-2
J Scenario-dependent J gene segment TRBJ2-1
Peptide Optional True peptide (evaluation only) GILGFVFTL

Note: The prediction API uses a single MHC column. Some training notebooks split HLA into HLA-A, HLA-B, HLA-C; merge to MHC for prediction or use the Zenodo CSV format.

Default scenario 16 uses ED + HLA + VJ features — provide CDR3, MHC, V, and J.

Demo notebook (source install)

pip install peptcrnet[notebooks]
jupyter notebook DEMO_Complete_Pipeline.ipynb

Citation

@article{le2025peptcrnet,
  title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
  author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={4},
  pages={bbaf351},
  year={2025},
  doi={10.1093/bib/bbaf351}
}

License

MIT — see LICENSE.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptcrnet-1.2.0.tar.gz (976.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peptcrnet-1.2.0-py3-none-any.whl (242.5 kB view details)

Uploaded Python 3

File details

Details for the file peptcrnet-1.2.0.tar.gz.

File metadata

  • Download URL: peptcrnet-1.2.0.tar.gz
  • Upload date:
  • Size: 976.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.2.0.tar.gz
Algorithm Hash digest
SHA256 12638f92a175f438133040cb354dba13f8527554be8927b0baf8f0983d0b176d
MD5 75bba37afab362685f06e2579ef38875
BLAKE2b-256 9dffe0eba133d269359fd2da25affda5b66ea3bb83f4de2c30ab70521d7f0e95

See more details on using hashes here.

File details

Details for the file peptcrnet-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: peptcrnet-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 242.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bafaf640a9517b04cec5d2198d7b5f6c1632b68bdc9156f0a4ee0954fd92d768
MD5 18fa417a9b95cc460ae7397f1f0e3df3
BLAKE2b-256 92a28b7401bdd4f2fb3772d470d1d4bcc8bc9010258abd1cf39e8723832467cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page