Skip to main content

AnthroAb: Human antibody language model based on RoBERTa for humanization

Project description

AnthroAb: Antibody Humanization Language Model

              █████  ███    ██ ████████ ██   ██ ██████   ██████      █████  ██████  
             ██   ██ ████   ██    ██    ██   ██ ██   ██ ██    ██    ██   ██ ██   ██ 
             ███████ ██ ██  ██    ██    ███████ ██████  ██    ██ ██ ███████ ██████  
             ██   ██ ██  ██ ██    ██    ██   ██ ██   ██ ██    ██    ██   ██ ██   ██ 
             ██   ██ ██   ████    ██    ██   ██ ██   ██  ██████     ██   ██ ██████

AnthroAb is a human antibody language model based on RoBERTa, specifically trained for antibody humanization tasks.

Features

  • Antibody Humanization: Predict humanized versions of antibody sequences
  • Sequence Infilling: Fill masked positions with human-like residues
  • Mutation Suggestions: Suggest humanizing mutations for frameworks and CDRs
  • Embedding Generation: Create vector representations of residues or sequences
  • Dual Chain Support: Separate models for Variable Heavy (VH) and Variable Light (VL) chains

Installation

# Install from PyPI 
conda create -n anthroab python=3.10
conda activate anthroab
pip install anthroab

# Or install from source
git clone https://github.com/nagarh/AnthroAb
cd AnthroAb
pip install -e .

Quick Start

Antibody Sequence Humanization

import anthroab

# Humanize a heavy chain sequence
vh_sequence = "'**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS"
humanized_vh = anthroab.predict_best_score(vh_sequence, 'H')
print(f"Humanized VH: {humanized_vh}")

# Humanize a light chain sequence
vl_sequence = "DIQMTQSPSSLSASV*DRVTITCRASQSISSYLNWYQQKPGKAPKLLIYSASTLASGVPSRFSGSGSGTDF*LTISSLQPEDFATYYCQQSYSTPRTFGQGTKVEIK"
humanized_vl = anthroab.predict_best_score(vl_sequence, 'L')
print(f"Humanized VL: {humanized_vl}")

Model Details

Architecture

  • Base Model: RoBERTa (trained from scratch)
  • Architecture: RobertaForMaskedLM
  • Model Type: Masked Language Model for antibody sequences

Model Specifications

  • Hidden Size: 768
  • Number of Layers: 12
  • Number of Attention Heads: 12
  • Intermediate Size: 3072
  • Max Position Embeddings: 192 (VH), 145 (VL)
  • Vocabulary Size: 25 tokens
  • Model Size: ~164 MB per model

Available Models

  • VH Model: hemantn/roberta-base-humAb-vh - For Variable Heavy chains
  • VL Model: hemantn/roberta-base-humAb-vl - For Variable Light chains

Citation

If you use AnthroAb in your research, please cite:

@misc{anthroab,
  author = {Hemant N},
  title = {AnthroAb: Human Antibody Language Model for Humanization},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/hemantn/roberta-base-humAb-vh}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Note: This codebase and API design are adopted from the Sapiens model by Merck.AnthroAb maintains the same interface and functionality as Sapiens but utilizes a RoBERTa-base model trained on human antibody sequences from the OAS database (up to year 2025) for antibody humanization.

Original Sapiens Citation

David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203

Related Projects

  • Sapiens: Original antibody language model by Merck (this codebase is based on Sapiens)
  • BioPhi: Antibody design and humanization platform
  • OAS: Observed Antibody Space database

Support

For questions, issues, or contributions, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anthroab-1.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anthroab-1.1.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file anthroab-1.1.0.tar.gz.

File metadata

  • Download URL: anthroab-1.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for anthroab-1.1.0.tar.gz
Algorithm Hash digest
SHA256 330a63df7633a325aa653afa025104c0f3627781d8ec31997cae8cadd77fd84c
MD5 c25ebd4c5d93ee7e2ba4ab148399ce28
BLAKE2b-256 269a182b8a4a1456782f9b98632a9eb78353a16645ac7792c7cba6db9a377bda

See more details on using hashes here.

File details

Details for the file anthroab-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: anthroab-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for anthroab-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cbb018d3fc5c76d5dfcdd0b87646907805dff10b8c61bc9f53b81b12320e6afa
MD5 a91db75d70752283b0cf1161ffd9437e
BLAKE2b-256 4220291977ffb1f06718324b2109e234b542b7e21a4e4ab95a646b0af9456995

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page