Skip to main content

AnthroAb: Human antibody language model based on RoBERTa for humanization

Project description

AnthroAb: Human Antibody Language Model

             █████  ███    ██ ████████ ██   ██ ██████   ██████      █████  ██████  
             ██   ██ ████   ██    ██    ██   ██ ██   ██ ██    ██    ██   ██ ██   ██ 
             ███████ ██ ██  ██    ██    ███████ ██████  ██    ██ ██ ███████ ██████  
             ██   ██ ██  ██ ██    ██    ██   ██ ██   ██ ██    ██    ██   ██ ██   ██ 
             ██   ██ ██   ████    ██    ██   ██ ██   ██  ██████     ██   ██ ██████

AnthroAb is a human antibody language model based on RoBERTa, specifically trained for antibody humanization tasks.

Features

  • Antibody Humanization: Predict humanized versions of antibody sequences
  • Sequence Infilling: Fill masked positions with human-like residues
  • Mutation Suggestions: Suggest humanizing mutations for frameworks and CDRs
  • Embedding Generation: Create vector representations of residues or sequences
  • Dual Chain Support: Separate models for Variable Heavy (VH) and Variable Light (VL) chains

Installation

# Install from PyPI (when published)
pip install anthroab

# Or install from source
git clone https://github.com/your-username/AnthroAb
cd AnthroAb
pip install -e .

Quick Start

Antibody Sequence Humanization

import anthroab

# Humanize a heavy chain sequence
vh_sequence = "***LV*SGAEVKKPGASVKVSCKASGYTFTDYYIHWVKQRPEQGLEWIGWIDPENGDTEYAPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCARNLGPSFYFDYWGQGTLVTVSS"
humanized_vh = anthroab.predict_best_score(vh_sequence, 'H')
print(f"Humanized VH: {humanized_vh}")

# Humanize a light chain sequence
vl_sequence = "DIQMTQSPSSLSASV*DRVTITCRASQSISSYLNWYQQKPGKAPKLLIYSASTLASGVPSRFSGSGSGTDF*LTISSLQPEDFATYYCQQSYSTPRTFGQGTKVEIK"
humanized_vl = anthroab.predict_best_score(vl_sequence, 'L')
print(f"Humanized VL: {humanized_vl}")

Model Details

Architecture

  • Base Model: RoBERTa (trained from scratch)
  • Architecture: RobertaForMaskedLM
  • Model Type: Masked Language Model for antibody sequences

Model Specifications

  • Hidden Size: 768
  • Number of Layers: 12
  • Number of Attention Heads: 12
  • Intermediate Size: 3072
  • Max Position Embeddings: 192 (VH), 145 (VL)
  • Vocabulary Size: 25 tokens
  • Model Size: ~164 MB per model

Available Models

  • VH Model: hemantn/roberta-base-humAb-vh - For Variable Heavy chains
  • VL Model: hemantn/roberta-base-humAb-vl - For Variable Light chains

Citation

If you use AnthroAb in your research, please cite:

@misc{anthroab,
  author = {Hemant N},
  title = {AnthroAb: Human Antibody Language Model for Humanization},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/hemantn/roberta-base-humAb-vh}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Note: This codebase and API design are adopted from the Sapiens model by Merck.AnthroAb maintains the same interface and functionality as Sapiens but utilizes a RoBERTa-base model trained on human antibody sequences from the OAS database (up to year 2025) for antibody humanization.

Original Sapiens Citation

David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203

Related Projects

  • Sapiens: Original antibody language model by Merck (this codebase is based on Sapiens)
  • BioPhi: Antibody design and humanization platform
  • OAS: Observed Antibody Space database

Support

For questions, issues, or contributions, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anthroab-1.0.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anthroab-1.0.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file anthroab-1.0.1.tar.gz.

File metadata

  • Download URL: anthroab-1.0.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for anthroab-1.0.1.tar.gz
Algorithm Hash digest
SHA256 4969a9f9c38cd2a559df420a3031df8ed4ed89107f7ebfe43bafca91645f7ad7
MD5 3dedd05f8af70603579fd2a1faffa5ab
BLAKE2b-256 d85ad19d3912c7d73e22102493b7427969feda7ebd680e1d8fdfffc93606017b

See more details on using hashes here.

File details

Details for the file anthroab-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: anthroab-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for anthroab-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3aad18017d74e39278b0ddfe8cfb92a741f499703a08001fe985cf04c590157
MD5 02ed24107b122fa69ea27e4a1cf3efbd
BLAKE2b-256 0942f1f71c705c81b9f51443c4a34eb3ffa70a0a822440282d41201b93035281

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page