ESM-Cambrian Binding Affinity prediction for peptide-MHC binding
Project description
ESM-Cambrian Binding Affinity (ESMCBA)
This repository bundles code, data, notebooks, and trained models for exploring peptide–MHC (pMHC) binding with ESM Cambrian protein language models.
Code: https://github.com/sermare/ESMCBA
Models: https://huggingface.co/smares/ESMCBA
Quick facts
| Item | Details |
|---|---|
| Main package | ESMCBA/ (Python 3.10 modules and utilities) |
| Core tasks | • Generate ESM embeddings • Fine‑tune / evaluate binding‑affinity (BA) regressors and classifiers • Compare to external predictors (MHCFlurry, HLAthena, MixMHCpred, MHCnuggets) • Visualise embeddings (UMAP) • Analyse RFdiffusion pMHC designs & contact maps |
| Key data sources | IEDB IC₅₀ tables, HLA sequences, Apollo test sets, RFdiffusion outputs |
| Model checkpoints | Available on Hugging Face: smares/ESMCBA |
| Figures | Publication‑ready PDFs under figures/ and figures_manuscript/ |
| Environment | Conda env ESM_cambrian (Python 3.10, PyTorch 2.6, transformers 4.46, esm 3.1.3) |
Directory outline
ESMCBA/ # importable package: modelling & utilities
│
├─ models/
│ ├─ ESM_Supervised/ # model definitions + checkpoints
│ └─ ESM_Unsupervised/
│
data/ # CSV/TSV inputs and intermediate results
│ ├─ Amino_Acid_Properties.csv
│ ├─ IEDB_full_subset_filtered_out_MHCFlurry.csv
│ └─ ... (predictions_*.tsv, evaluation_*.csv, etc.)
│
figures/ # exploratory plots (logos, ROC curves, etc.)
figures_manuscript/ # final manuscript figures
performances/ # aggregated model‑metric CSVs
jupyter_notebooks/ # reproducible analysis notebooks
└─ (GIFs, RFdiffusion outputs, misc.)
Quick Start
You can access this notebook to run with google collab:
Installation
Step 1: Clone the repository
git clone https://github.com/sermare/ESMCBA
cd ESMCBA
Step 2: Create and activate the conda environment
# Create environment
conda create -n ESM_cambrian python=3.10 -y
conda activate ESM_cambrian
Step 3: Install required packages
# Install core PyTorch and Transformers ecosystem
pip install torch
pip install transformers
pip install esm
# Install Hugging Face Hub utilities
pip install "huggingface-hub<1.0"
# Optional: Install hf_transfer for faster large file downloads
pip install hf_transfer
pip install biopython umap-learn scikit-learn seaborn pandas matplotlib
Note: The esm and umap-learn packages are essential for running the embeddings generation and visualization scripts.
(Install predictors like mhcflurry separately if you intend to rerun benchmarking notebooks.)
Download Model Checkpoints
All trained model checkpoints are hosted on Hugging Face: https://huggingface.co/smares/ESMCBA
Available checkpoints (examples):
ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pthESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAB1503_2_0.0001_1e-05__2_B1503_0404_Hubber_B1503_final.pthESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_6_HLAB5101_5_0.001_1e-06__3_B5101_Hubber_B5101_final.pth
Browse all files: https://huggingface.co/smares/ESMCBA
Download options:
Option A: Download all checkpoints to a local folder
# Download everything to ./models
hf download smares/ESMCBA --repo-type model --local-dir ./models
Option B: Download a specific checkpoint
#or just get one model
huggingface-cli download smares/ESMCBA
"ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_5_0.001_1e-06_AUG_6_HLAA0201_2_0.001_1e-06__2_A0201_Hubber_A0201_final.pth"
--repo-type model
--local-dir ./models
Option C: Use Hugging Face cache (automatic)
If you omit --local-dir, files will be downloaded to your HF cache (e.g., ~/.cache/huggingface/hub/).
To change the cache location:
export HF_HOME=/path/to/cache
Typical workflow
| Step | Script / notebook | Output |
|---|---|---|
| 1 | embeddings_generation.py |
Embedding files in data/ |
| 2 | make_ESMCBA_models.py (supervised) or forward_pass_unsupervised.py |
Checkpoints in models/ |
| 3 | evaluation_IEDB_qual.py |
Metric CSVs + ROC/AUC PDFs |
| 4 | HLA_full_sequences_UMAP.py |
UMAP plots in figures/ |
| 5 | Notebooks under jupyter_notebooks/rdfiffusion/ |
Contact maps, hit‑rate tables |
Run any script with -h to see its arguments.
To Run Predictions (run embeddings.py)
The embeddings_generation.py script generates ESM predictions and the embeddings for peptide sequences.
Example 1: Using a downloaded checkpoint
cd ESMCBA/ESMCBA
python3 embeddings_generation.py \
--model_path ./models/ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_6_HLAB5101_5_0.001_1e-06__3_B5101_Hubber_B5101_final.pth \
--name B5101-ESMCBA \
--hla B5101 \
--encoding epitope \
--output_dir ./outputs \
--peptides ASCQQQRAGHS ASCQQQRAGH ASCQQQRAG DVRLSAHHHR DVRLSAHHHRM GHSDVRLSAHH
Example 2: Auto-download from Hugging Face
If the script supports Hugging Face paths, you can specify just the filename or an hf:// path:
python3 embeddings_generation.py \
--model_path "ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth" \
--name B1402-ESMCBA \
--hla B1402 \
--encoding epitope \
--output_dir ./outputs \
--peptides ASCQQQRAGHS ASCQQQRAGH ASCQQQRAG DVRLSAHHHR DVRLSAHHHRM GHSDVRLSAHH
or with explicit hf:// prefix:
python3 embeddings_generation.py \
--model_path "hf://smares/ESMCBA/ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth" \
--name B1402-ESMCBA \
--hla B1402 \
--encoding epitope \
--output_dir ./outputs \
--peptides ASCQQQRAGHS ASCQQQRAGH ASCQQQRAG DVRLSAHHHR DVRLSAHHHRM GHSDVRLSAHH
GPU vs CPU
- By default, PyTorch will use GPU if available
- To force CPU:
export CUDA_VISIBLE_DEVICES=""
Troubleshooting
Model downloads
- "huggingface-cli download is deprecated": Use
hf downloadinstead - Permission errors: Public models don't require login. For private models:
hf login - Slow transfers: Install
hf_transferand exportHF_HUB_ENABLE_HF_TRANSFER=1 - File not found: Double-check the exact filename on the Hub (filenames are long—copy and paste)
Import errors
- "No module named 'esm'": Make sure you ran
pip install esm==3.1.3 - "No module named 'umap'": Install via
pip install umap-learn==0.5.7
Reproducibility tips
Record the exact commit of the code and the model snapshot for papers and reviews:
Code commit: <git SHA from ESMCBA repo>
Model snapshot: <commit SHA from HF snapshots path>
HLA: B5101
Encoding: epitope
Citing
S. Mares (2025). Continued domain-specific pre-training of protein language models for pMHC-I binding prediction.
DOI / preprint.
Model paths
| HLA | Model checkpoint |
|---|---|
| B5101 | ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_6_HLAB5101_5_0.001_1e-06__3_B5101_Hubber_B5101_final.pth |
| A0206 | ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_25_0.0001_1e-06_AUG_1_HLAA0206_2_0.001_1e-06__1_A0206_Hubber_A0206_final.pth |
| B3701 | ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAB3701_1_0.0001_1e-05__1_B3701_0404_Hubber_B3701_final.pth |
| B5301 | ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_15_0.001_1e-06_AUG_6_HLAB5301_1_0.0001_1e-05__1_B5301_0404_Hubber_B5301_final.pth |
| A2402 | ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_1_HLAA2402_1_0.0001_1e-06__2_A2402_0404_Hubber_A2402_final.pth |
| C0802 | ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_20_0.001_5e-05_AUG_1_HLAC0802_2_0.0001_1e-05__2_C0802_0404_Hubber_C0802_final.pth |
| A0301 | ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_25_0.001_0.001_AUG_1_HLAA0301_1_0.001_1e-06__1_A0301_Hubber_A0301_final.pth |
| B3501 | ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAB3501_2_0.001_0.001__4_B3501_Hubber_B3501_final.pth |
| C1502 | ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAC1502_2_0.0001_1e-06__1_C1502_0404_Hubber_C1502_final.pth |
| B4601 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_1e-06_AUG_6_HLAB4601_1_0.0001_1e-05__2_B4601_0404_Hubber_B4601_final.pth |
| C0501 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_1e-06_AUG_6_HLAC0501_2_0.0001_1e-06__2_C0501_0404_Hubber_C0501_final.pth |
| A3201 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAA3201_2_0.0001_1e-06__1_A3201_0404_Hubber_A3201_final.pth |
| A0205 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_3_HLAA0205_2_0.0001_1e-06__2_A0205_0404_Hubber_A0205_final.pth |
| A3001 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.0001_1e-06_AUG_3_HLAA3001_4_0.0001_0.001__3_A3001_Hubber_A3001_final.pth |
| A0101 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_6_HLAA0101_2_0.001_0.001__3_A0101_Hubber_A0101_final.pth |
| C1203 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_1_HLAC1203_1_0.0001_1e-05__2_C1203_0404_Hubber_C1203_final.pth |
| A0207 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAA0207_1_0.0001_1e-06__2_A0207_0404_Hubber_A0207_final.pth |
| A0211 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAA0211_2_0.0001_1e-06__1_A0211_0404_Hubber_A0211_final.pth |
| B5801 | ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAB5801_2_0.0001_1e-06__2_B5801_0404_Hubber_B5801_final.pth |
| B0702 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_15_0.0001_0.001_AUG_6_HLAB0702_3_0.001_1e-06__4_B0702_Hubber_B0702_final.pth |
| C0701 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAC0701_2_0.0001_1e-05__1_C0701_0404_Hubber_C0701_final.pth |
| B3801 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_3_HLAB3801_2_0.0001_1e-06__1_B3801_0404_Hubber_B3801_final.pth |
| C0303 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_3_HLAC0303_1_0.0001_1e-05__2_C0303_0404_Hubber_C0303_final.pth |
| B4501 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_1_HLAB4501_2_0.0001_1e-05__2_B4501_0404_Hubber_B4501_final.pth |
| B4001 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAB4001_1_0.0001_1e-06__2_B4001_0404_Hubber_B4001_final.pth |
| A0201 | ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_5_0.001_1e-06_AUG_6_HLAA0201_2_0.001_1e-06__2_A0201_Hubber_A0201_final.pth |
| C0602 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAC0602_2_0.0001_1e-06__1_C0602_0404_Hubber_C0602_final.pth |
| A2501 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_1_HLAA2501_1_0.0001_1e-06__1_A2501_0404_Hubber_A2501_final.pth |
| B5401 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_20_0.001_5e-05_AUG_1_HLAB5401_2_0.0001_1e-06__2_B5401_0404_Hubber_B5401_final.pth |
| A1101 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.0001_1e-05_AUG_3_HLAA1101_5_0.001_1e-06__2_A1101_Hubber_A1101_final.pth |
| B1801 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.0001_1e-05_AUG_6_HLAB1801_1_0.001_1e-06__4_B1801_Hubber_B1801_final.pth |
| B1501 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_0.001_AUG_3_HLAB1501_2_0.001_0.001__2_B1501_Hubber_B1501_final.pth |
| A6801 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_1_HLAA6801_2_0.0001_1e-06__4_A6801_Hubber_A6801_final.pth |
| B2705 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAB2705_2_0.0001_1e-06__2_B2705_0404_Hubber_B2705_final.pth |
| C0401 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAC0401_2_0.0001_1e-06__1_C0401_0404_Hubber_C0401_final.pth |
| B1502 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1502_1_1e-05_1e-05__1_B1502_0404_Hubber_B1502_final.pth |
| A0202 | ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAA0202_1_0.0001_1e-05__2_A0202_0404_Hubber_A0202_final.pth |
| A2601 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_1_HLAA2601_5_0.001_0.001__4_A2601_Hubber_A2601_final.pth |
| C0702 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAC0702_1_0.0001_1e-05__1_C0702_0404_Hubber_C0702_final.pth |
| A3301 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_20_0.001_0.001_AUG_1_HLAA3301_5_0.001_1e-06__4_A3301_Hubber_A3301_final.pth |
| B0801 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_1_HLAB0801_1_0.0001_1e-06__1_B0801_0404_Hubber_B0801_final.pth |
| B1517 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_20_0.001_5e-05_AUG_3_HLAB1517_1_0.0001_1e-05__2_B1517_0404_Hubber_B1517_final.pth |
| A0203 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_0.001_AUG_6_HLAA0203_2_0.001_0.001__2_A0203_Hubber_A0203_final.pth |
| B5701 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_1_HLAB5701_2_0.0001_1e-05__1_B5701_Hubber_B5701_final.pth |
| B4402 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_3_HLAB4402_1_0.001_0.001__2_B4402_Hubber_B4402_final.pth |
| A6802 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAA6802_2_0.001_1e-06__4_A6802_Hubber_A6802_final.pth |
| B4403 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAB4403_1_0.0001_1e-06__1_B4403_0404_Hubber_B4403_final.pth |
| C1402 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAC1402_1_0.0001_1e-06__1_C1402_0404_Hubber_C1402_final.pth |
| B4002 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAB4002_2_0.0001_1e-05__1_B4002_0404_Hubber_B4002_final.pth |
| A3101 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAA3101_2_0.0001_1e-06__2_A3101_0404_Hubber_A3101_final.pth |
| B1402 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth |
| B1503 | ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAB1503_2_0.0001_1e-05__2_B1503_0404_Hubber_B1503_final.pth |
Maintenance checklist
- Remove
__pycache__/and large binaries from Git; ignore via.gitignoreor track via Git‑LFS - Consolidate duplicate CSVs in
performances/ - Standardise file names with stray colon or non‑ASCII characters (e.g.
input_B_15:01_output.csv)
License
Follow the license in the GitHub repo for code and the model card in the Hugging Face repo for model weights.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file esmcba-0.1.1.tar.gz.
File metadata
- Download URL: esmcba-0.1.1.tar.gz
- Upload date:
- Size: 43.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1239211210196fd2d9c2470ecd2d3a6c51acc44211dc648e9e33f315265606a9
|
|
| MD5 |
dde69a68521e8080481e6f3c29026391
|
|
| BLAKE2b-256 |
4ee8c9a90c1af5ea9093d4e8333e1d38fe4d4c2fa75cf1758beabcb0dbc513c7
|
File details
Details for the file esmcba-0.1.1-py3-none-any.whl.
File metadata
- Download URL: esmcba-0.1.1-py3-none-any.whl
- Upload date:
- Size: 61.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8111cca9ba91489ce700354c0e6af0b8d5866de36a53fe479f1fd001da1de9f
|
|
| MD5 |
68883c2f5795bb36b6ee87f8e4695854
|
|
| BLAKE2b-256 |
a7e3cca697d2055255767c0b80e6ccee267987ac287095a0c1350c16c80c370e
|