Skip to main content

Structure-based Antibody Renumbering

Project description

Structure-based Antibody Renumbering

Tests Code Formatting Documentation PyPI version Python 3.11+ License: MIT

SAbR (Structure-based Antibody Renumbering) renumbers antibody PDB files using the 3D coordinate of backbone atoms. It uses custom forked versions of SoftAlign and ANARCI to align structures to SAbDaB-derived consensus embeddings and renumber to various antibody schemes, respectively.

Documentation

Full API documentation is available at sabr.readthedocs.io.

Installation and use

Requirements: Python 3.11 or higher

  1. SAbR can be installed into a virtual environment via pip:
# Latest release
pip install sabr-kit

# Most recent version from Github
git clone --recursive https://github.com/delalamo/SAbR.git
cd SAbR/
pip install -e .

It can then be run using the sabr command (see below).

  1. Alternatively, SAbR can be directly run with the latest docker container:
docker run --rm ghcr.io/delalamo/sabr:latest -i input.pdb -o output.pdb -c CHAIN_ID

Running SAbR

Practical considerations:

  • Heavy and light chain structures are similar enough that chain type should be manually declared with --chain-type if possible (leave blank if uncertain).
  • It is recommended for now to truncate the query structure to contain only the Fv when running SAbR, as it will sometimes align variable region beta-strands to those in the constant region.
  • When running scFvs, it is recommended to run each variable domain independently.

If running on a Mac with apple silicon, set the environmental variable JAX_PLATFORMS to cpu.

Usage: sabr [OPTIONS]

  Structure-based Antibody Renumbering (SAbR) renumbers antibody structure
  files using the 3D coordinates of backbone atoms. Supports both PDB and
  mmCIF input formats.

Options:
  -i, --input-pdb FILE            Input structure file (PDB or mmCIF format).
                                  [required]
  -c, --input-chain TEXT          Chain identifier to renumber (single
                                  character).  [required]
  -o, --output FILE               Destination structure file. Use .pdb
                                  extension for PDB format or .cif extension
                                  for mmCIF format. mmCIF is required when
                                  using --extended-insertions.  [required]
  -n, --numbering-scheme [imgt|chothia|kabat|martin|aho|wolfguy]
                                  Numbering scheme.  [default: IMGT]
  --overwrite                     Overwrite the output file if it already
                                  exists.
  -v, --verbose                   Enable verbose logging.
  --residue-range START END       Range of residues to process in PDB
                                  numbering (inclusive). Use '0 0' (default)
                                  to process all residues. Example:
                                  --residue-range 1 120 processes residues
                                  1-120.
  --extended-insertions           Enable extended insertion codes (AA, AB,
                                  ..., ZZ, AAA, etc.) for antibodies with very
                                  long CDR loops. Requires mmCIF output format
                                  (.cif extension). Standard PDB format only
                                  supports single-character insertion codes
                                  (A-Z, max 26 insertions per position).
  --disable-deterministic-renumbering
                                  Disable deterministic renumbering corrections
                                  for loop regions. By default, corrections are
                                  applied for FR1, DE loop, and CDR loops.
  -t, --chain-type [H|K|L|heavy|kappa|lambda|auto]
                                  Chain type for ANARCI numbering.
                                  H/heavy=heavy chain, K/kappa=kappa light,
                                  L/lambda=lambda light. Use 'auto' (default)
                                  to detect from DE loop occupancy.
                                  [default: auto]
  -h, --help                      Show this message and exit.

Python API

SAbR can also be used programmatically to renumber BioPython Structure objects directly in memory:

from Bio.PDB import PDBParser, PDBIO
from sabr import renumber

# Load a structure
parser = PDBParser(QUIET=True)
structure = parser.get_structure("antibody", "input.pdb")

# Renumber the structure (returns a new BioPython Structure)
renumbered = renumber.renumber_structure(
    structure,
    chain="H",                      # Chain identifier
    numbering_scheme="imgt",        # imgt, chothia, kabat, martin, aho, wolfguy
    chain_type="auto",              # H, K, L, or auto
)

# Optionally specify a residue range
renumbered = renumber.renumber_structure(
    structure,
    chain="H",
    res_start=1,                    # Start at residue 1
    res_end=128,                    # End at residue 128
)

# Save the renumbered structure
io = PDBIO()
io.set_structure(renumbered)
io.save("output.pdb")

Known issues

  • SAbR currently struggles with scFvs for two reasons. First, it is unclear how to assign canonical numbering to multiple domains within a single chain, unless we accept a spacer (e.g., starting chain #2 at 201 instead of 1). Second, it will sometimes align across both chains, introducing a massive insertion in between. It is unclear how to prevent this; please see issue #2 for details.
  • SAbR sometimes mistakenly includes sheets from the Fab in the VH.
  • The algorithm for renumbering CDRs, which is the same as the one for IMGT, does not account for unassigned residues. So if a residue is missing due to heterogeneity, the CDR numbering algorithm will misnumber other residues in the CDR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sabr_kit-0.3.3.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sabr_kit-0.3.3-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file sabr_kit-0.3.3.tar.gz.

File metadata

  • Download URL: sabr_kit-0.3.3.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sabr_kit-0.3.3.tar.gz
Algorithm Hash digest
SHA256 87bb6a79b6484e879c46a0e9a86d5ec1dda6070fe0c885a404c74ddb8d249446
MD5 072446e9f11832754897eaeb968cc977
BLAKE2b-256 9a0e2a84406b5305edb2376223679cf141b06d0bbb348f0348dabccf9c48d59d

See more details on using hashes here.

File details

Details for the file sabr_kit-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: sabr_kit-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sabr_kit-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2e0cbee2626b95f60921025f371dc656e2481638292e6a19007e966c89324342
MD5 0988f3b632a08eadf19fdbb36c110a2a
BLAKE2b-256 74e840e30608e183a3dd7223e9fa316e76a002f0ebfcf720e50594089bb37603

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page