Epitope-anchored contrastive transfer learning for paired CD8+ T Cell receptor-antigen Rrcognition
Project description
EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition
This repository contains the source code for the paper Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition.
EPACT is developed by a divide-and-conquer paradigm that combines pre-training on TCR or pMHC data and transfer learning to predict TCR$\alpha\beta$-pMHC binding specificity and interaction conformation via epitope-anchored contrastive learning.
Colab Notebook 
Installation
-
Clone the repository.
git clone https://github.com/zhangyumeng1sjtu/EPACT.git
-
Create a virtual environment by conda.
conda create -n EPACT_env python=3.10.12 conda activate EPACT_env
-
Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.
conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7 pip install -r requirements.txt
Data and model checkpoints
The following data and model checkpoints are available at Zenodo.
data/binding: binding data between paired TCR$\alpha\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.data/pretrained: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.data/structure: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.checkpoints/paired-cdr3-pmhc-binding: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR3 sequences.checkpoints/paired-cdr123-pmhc-binding: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.checkpoints/paired-cdr123-pmhc-interaction: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.checkpoints/pretrained: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).
Usage
1. Pre-training
-
Pre-train peptide and TCR$\alpha\beta$ language models.
# pretrain epitope masked language model. python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml # pretrain paired cdr3 masked language model. python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml # pretrain paired cdr123 masked language model. python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml
-
Train peptide-MHC binding affinity or eluted ligand models.
# pretrain peptide-MHC binding affinity model. python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml # pretrain peptide-MHC eluted ligand model. python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml
2. Predict binding specificity
-
Train TCR$\alpha\beta$-pMHC binding models.
# finetune Paired TCR-pMHC binding model (CDR3). python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml # finetune Paired TCR-pMHC binding model (CDR123). python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml
-
Predict TCR$\alpha\beta$-pMHC binding specificity.
# predict cross-validation results for i in {1..5} do python scripts/predict/predict_tcr_pmhc_binding.py \ --config configs/config-paired-cdr123-pmhc-binding.yml \ --input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \ --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\ --log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/ done
-
Predict TCR$\alpha\beta$-pMHC binding ranks compared to background TCRs
# predict binding ranks for SARS-CoV-2 responsive TCR clonotypes python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \ --log_dir results/ranking-covid-cdr123/ \ --input_data_path data/binding/covid_clonotypes.csv \ --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \ --bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \ --num_bg_tcrs 20000
3. Predict interaction conformation
-
Train TCR$\alpha\beta$-pMHC interaction model.
# finetune Paired TCR-pMHC interaction model (CDR123). python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml
-
Predict TCR$\alpha\beta$-pMHC interaction conformations.
# predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides. for i in {1..5} do python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \ --input_data_path data/MEL8_A0201_peptides.csv \ --model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \ --log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/ done
Citation
@article {Zhang2024.04.05.588255,
author = {Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},
title = {Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor-antigen recognition},
elocation-id = {2024.04.05.588255},
year = {2024},
doi = {10.1101/2024.04.05.588255},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255},
eprint = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255.full.pdf},
journal = {bioRxiv}
}
Contact
If you have any questions, please contact us at zhangyumeng1@sjtu.edu.cn or jiangning.song@monash.edu.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epact-0.1.1.tar.gz.
File metadata
- Download URL: epact-0.1.1.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
710b583cf9112d2ccdc4a2ef040eae59131f83a7cea9d36037649beca6d413bd
|
|
| MD5 |
be8b5b5f8cfa816d532ad3ea69453027
|
|
| BLAKE2b-256 |
61f25988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0
|
File details
Details for the file epact-0.1.1-py3-none-any.whl.
File metadata
- Download URL: epact-0.1.1-py3-none-any.whl
- Upload date:
- Size: 49.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0864df13057a0707b4151b02bb7bb4e0d1e7099995fb7cd4066d6f8c0fea7d0e
|
|
| MD5 |
f6d70285529aa8f5afd15118b564f72e
|
|
| BLAKE2b-256 |
e93b8ada381122cbded1da4ccff265a4216910cb03a40816f34506daea38def1
|