Skip to main content

Structure-based Antibody Renumbering

Project description

Structure-based Antibody Renumbering

This repo is currently in development. If you encounter any bugs, please report the issue here.

SAbR (Structure-based Antibody Renumbering) renumbers antibody PDB files using the 3D coordinate of backbone atoms. It uses custom forked versions of SoftAlign and ANARCI to align structures to SAbDaB-derived consensus embeddings and renumber to various antibody schemes, respectively.

Installation and use

  1. SAbR can be installed into a virtual environment via pip:
# Latest release
pip install sabr-kit

# Most recent version from Github
git clone --recursive https://github.com/delalamo/SAbR.git
cd SAbR/
pip install -e .

It can then be run using the sabr command (see below).

  1. Alternatively, SAbR can be directly run with the latest docker container:

This doesn't currently work. Please check back soon!

docker run --rm ghcr.io/delalamo/sabr:latest -i input.pdb -o output.pdb -c CHAIN_ID

Running SAbR

Practical considerations:

  • Heavy and light chain structures are similar enough that chain type should be manually declared with --chain-type if possible (leave blank if uncertain).
  • It is recommended for now to truncate the query structure to contain only the Fv when running SAbR, as it will sometimes align variable region beta-strands to those in the constant region.
  • When running scFvs, it is recommended to run each variable domain independently.

If running on a Mac with apple silicon, set the environmental variable JAX_PLATFORMS to cpu.

Usage: sabr [OPTIONS]

  Structure-based Antibody Renumbering (SAbR) renumbers antibody PDB files
  using the 3D coordinates of backbone atoms.

Options:
  -i, --input-pdb FILE            Input PDB file.  [required]
  -c, --input-chain TEXT          Chain identifier to renumber.
  -o, --output FILE               Destination structure file. Use .pdb
                                  extension for PDB format or .cif extension
                                  for mmCIF format. mmCIF is required when
                                  using --extended-insertions.  [required]
  -n, --numbering-scheme [imgt|chothia|kabat|martin|aho|wolfguy]
                                  Numbering scheme.  [default: (IMGT)]
  --overwrite                     Overwrite the output PDB if it already
                                  exists.
  -v, --verbose                   Enable verbose logging.
  --max-residues INTEGER          Maximum number of residues to process from
                                  the chain. If 0 (default), process all
                                  residues.
  -t, --chain-type [heavy|light|auto]
                                  Restrict alignment to specific chain type
                                  embeddings. 'heavy' searches only heavy
                                  chain (H) embeddings, 'light' searches only
                                  light chain (K and L) embeddings, 'auto'
                                  searches all embeddings and picks the best
                                  match.  [default: auto]
  --extended-insertions           Enable extended insertion codes (AA, AB,
                                  ..., ZZ, AAA, etc.) for antibodies with very
                                  long CDR loops. Requires mmCIF output format
                                  (.cif extension). Standard PDB format only
                                  supports single-character insertion codes
                                  (A-Z, max 26 insertions per position)
  -v, --verbose         Verbose output

Known issues

  • SAbR currently struggles with scFvs for two reasons. First, it is unclear how to assign canonical numbering to multiple domains within a single chain, unless we accept a spacer (e.g., starting chain #2 at 201 instead of 1). Second, it will sometimes align across both chains, introducing a massive insertion in between. It is unclear how to prevent this; please see issue #2 for details.
  • SAbR sometimes mistakenly includes sheets from the Fab in the VH.
  • The algorithm for renumbering CDRs, which is the same as the one for IMGT, does not account for unassigned residues. So if a residue is missing due to heterogeneity, the CDR numbering algorithm will misnumber other residues in the CDR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sabr_kit-0.2.0.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sabr_kit-0.2.0-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file sabr_kit-0.2.0.tar.gz.

File metadata

  • Download URL: sabr_kit-0.2.0.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sabr_kit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 96fa96f66e29d24e20b76c3e14fed39c1a4a0faf3f93f5f8ab9605e710f2fcad
MD5 d42889f474437b2c4a81be1a26e135a2
BLAKE2b-256 893aa8a0b9a958f31a3062840e043cde86e980f794afcb2ea923d56a19bf7ad5

See more details on using hashes here.

File details

Details for the file sabr_kit-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sabr_kit-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sabr_kit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5edbd1759b4c9c7c447931b99c85a07afdc3268468d97b4caa838df6b5949f8
MD5 8705c0513196632b2c9da38052ed7baf
BLAKE2b-256 b798cae13fbee3b65f0d5392e6b7e68ceeed4e110ec5c8904fc4d83adf382ff5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page