AnthroAb: Human antibody language model based on RoBERTa for humanization
Project description
AnthroAb: Antibody Humanization Language Model
█████ ███ ██ ████████ ██ ██ ██████ ██████ █████ ██████
██ ██ ████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
███████ ██ ██ ██ ██ ███████ ██████ ██ ██ ██ ███████ ██████
██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██ ██ ██ ████ ██ ██ ██ ██ ██ ██████ ██ ██ ██████
AnthroAb is a human antibody language model based on RoBERTa, specifically trained for antibody humanization tasks.
Features
- Antibody Humanization: Predict humanized versions of antibody sequences
- Sequence Infilling: Fill masked positions with human-like residues
- Mutation Suggestions: Suggest humanizing mutations for frameworks and CDRs
- Embedding Generation: Create vector representations of residues or sequences
- Dual Chain Support: Separate models for Variable Heavy (VH) and Variable Light (VL) chains
Installation
# Install from PyPI
conda create -n anthroab python=3.10
conda activate anthroab
pip install anthroab
# Or install from source
git clone https://github.com/nagarh/AnthroAb
cd AnthroAb
pip install -e .
Quick Start
Antibody Sequence Humanization
import anthroab
# Humanize a heavy chain sequence
vh_sequence = "'**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS"
humanized_vh = anthroab.predict_best_score(vh_sequence, 'H')
print(f"Humanized VH: {humanized_vh}")
# Humanize a light chain sequence
vl_sequence = "DIQMTQSPSSLSASV*DRVTITCRASQSISSYLNWYQQKPGKAPKLLIYSASTLASGVPSRFSGSGSGTDF*LTISSLQPEDFATYYCQQSYSTPRTFGQGTKVEIK"
humanized_vl = anthroab.predict_best_score(vl_sequence, 'L')
print(f"Humanized VL: {humanized_vl}")
Model Details
Architecture
- Base Model: RoBERTa (trained from scratch)
- Architecture: RobertaForMaskedLM
- Model Type: Masked Language Model for antibody sequences
Model Specifications
- Hidden Size: 768
- Number of Layers: 12
- Number of Attention Heads: 12
- Intermediate Size: 3072
- Max Position Embeddings: 192 (VH), 145 (VL)
- Vocabulary Size: 25 tokens
- Model Size: ~164 MB per model
Available Models
- VH Model:
hemantn/roberta-base-humAb-vh- For Variable Heavy chains - VL Model:
hemantn/roberta-base-humAb-vl- For Variable Light chains
Citation
If you use AnthroAb in your research, please cite:
@misc{anthroab,
author = {Hemant N},
title = {AnthroAb: Human Antibody Language Model for Humanization},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/hemantn/roberta-base-humAb-vh}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
Note: This codebase and API design are adopted from the Sapiens model by Merck.AnthroAb maintains the same interface and functionality as Sapiens but utilizes a RoBERTa-base model trained on human antibody sequences from the OAS database (up to year 2025) for antibody humanization.
Original Sapiens Citation
David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203
Related Projects
- Sapiens: Original antibody language model by Merck (this codebase is based on Sapiens)
- BioPhi: Antibody design and humanization platform
- OAS: Observed Antibody Space database
Support
For questions, issues, or contributions, please open an issue on the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anthroab-1.1.0.tar.gz.
File metadata
- Download URL: anthroab-1.1.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
330a63df7633a325aa653afa025104c0f3627781d8ec31997cae8cadd77fd84c
|
|
| MD5 |
c25ebd4c5d93ee7e2ba4ab148399ce28
|
|
| BLAKE2b-256 |
269a182b8a4a1456782f9b98632a9eb78353a16645ac7792c7cba6db9a377bda
|
File details
Details for the file anthroab-1.1.0-py3-none-any.whl.
File metadata
- Download URL: anthroab-1.1.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbb018d3fc5c76d5dfcdd0b87646907805dff10b8c61bc9f53b81b12320e6afa
|
|
| MD5 |
a91db75d70752283b0cf1161ffd9437e
|
|
| BLAKE2b-256 |
4220291977ffb1f06718324b2109e234b542b7e21a4e4ab95a646b0af9456995
|