Skip to main content

Epitope-anchored contrastive transfer learning for paired CD8+ T Cell receptor-antigen Rrcognition

Project description

EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition

This repository contains the source code for the paper Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition.

model

EPACT is developed by a divide-and-conquer paradigm that combines pre-training on TCR or pMHC data and transfer learning to predict TCR$\alpha\beta$-pMHC binding specificity and interaction conformation via epitope-anchored contrastive learning.

Colab Notebook Open In Colab

Installation

  1. Clone the repository.

    git clone https://github.com/zhangyumeng1sjtu/EPACT.git
    
  2. Create a virtual environment by conda.

    conda create -n EPACT_env python=3.10.12
    conda activate EPACT_env
    
  3. Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.

    conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7
    pip install -r requirements.txt
    

Data and model checkpoints

The following data and model checkpoints are available at Zenodo.

  • data/binding: binding data between paired TCR$\alpha\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.
  • data/pretrained: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.
  • data/structure: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.
  • checkpoints/paired-cdr3-pmhc-binding: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR3 sequences.
  • checkpoints/paired-cdr123-pmhc-binding: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.
  • checkpoints/paired-cdr123-pmhc-interaction: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.
  • checkpoints/pretrained: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).

Usage

1. Pre-training

  • Pre-train peptide and TCR$\alpha\beta$ language models.

    # pretrain epitope masked language model.
    python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml
    
    # pretrain paired cdr3 masked language model.
    python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml
    
    # pretrain paired cdr123 masked language model.
    python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml
    
  • Train peptide-MHC binding affinity or eluted ligand models.

    # pretrain peptide-MHC binding affinity model.
    python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml
    
    # pretrain peptide-MHC eluted ligand model.
    python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml
    

2. Predict binding specificity

  • Train TCR$\alpha\beta$-pMHC binding models.

    # finetune Paired TCR-pMHC binding model (CDR3).
    python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml 
    
    # finetune Paired TCR-pMHC binding model (CDR123).
    python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml
    
  • Predict TCR$\alpha\beta$-pMHC binding specificity.

    # predict cross-validation results
    for i in {1..5}
    do
        python scripts/predict/predict_tcr_pmhc_binding.py \
            --config configs/config-paired-cdr123-pmhc-binding.yml \
            --input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \
            --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\
            --log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/
    done
    
  • Predict TCR$\alpha\beta$-pMHC binding ranks compared to background TCRs

    # predict binding ranks for SARS-CoV-2 responsive TCR clonotypes
    python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \
                                            --log_dir results/ranking-covid-cdr123/ \
                                            --input_data_path data/binding/covid_clonotypes.csv \
                                            --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \
                                            --bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \
                                            --num_bg_tcrs 20000
    

3. Predict interaction conformation

  • Train TCR$\alpha\beta$-pMHC interaction model.

    # finetune Paired TCR-pMHC interaction model (CDR123).
    python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml
    
  • Predict TCR$\alpha\beta$-pMHC interaction conformations.

    # predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides.
    for i in {1..5}
    do
        python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \
            --input_data_path data/MEL8_A0201_peptides.csv \
            --model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \
            --log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/
    done
    

Citation

@article {Zhang2024.04.05.588255,
	author = {Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},
	title = {Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor-antigen recognition},
	elocation-id = {2024.04.05.588255},
	year = {2024},
	doi = {10.1101/2024.04.05.588255},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255},
	eprint = {https://www.biorxiv.org/content/early/2024/04/07/2024.04.05.588255.full.pdf},
	journal = {bioRxiv}
}

Contact

If you have any questions, please contact us at zhangyumeng1@sjtu.edu.cn or jiangning.song@monash.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epact-0.1.1.tar.gz (45.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epact-0.1.1-py3-none-any.whl (49.7 kB view details)

Uploaded Python 3

File details

Details for the file epact-0.1.1.tar.gz.

File metadata

  • Download URL: epact-0.1.1.tar.gz
  • Upload date:
  • Size: 45.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for epact-0.1.1.tar.gz
Algorithm Hash digest
SHA256 710b583cf9112d2ccdc4a2ef040eae59131f83a7cea9d36037649beca6d413bd
MD5 be8b5b5f8cfa816d532ad3ea69453027
BLAKE2b-256 61f25988742d9a3fbff922074fdb4c4c2462355ded96592e88ebae2b60fc0fa0

See more details on using hashes here.

File details

Details for the file epact-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: epact-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 49.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for epact-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0864df13057a0707b4151b02bb7bb4e0d1e7099995fb7cd4066d6f8c0fea7d0e
MD5 f6d70285529aa8f5afd15118b564f72e
BLAKE2b-256 e93b8ada381122cbded1da4ccff265a4216910cb03a40816f34506daea38def1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page