Protein structure superposition package and CLI tool
Project description
pyprotalign
Protein structure superposition using sequence alignment and iterative refinement.
Features
- Sequence-based alignment: Automatically identifies corresponding atoms via sequence alignment
- Kabsch algorithm: Optimal least-squares superposition
- Iterative refinement: Outlier rejection for improved accuracy
- Quality filtering: Filter CA atoms by B-factor and occupancy
- Multi-chain support:
- Single-chain alignment with specified or default chains
- Global alignment of all matching chains (by ID or by order)
- Quaternary alignment with smart chain matching by proximity
- Batch processing: Align multiple mobile structures to a single reference
Installation
Using uv/pip
uv pip install pyprotalign
From source
git clone https://github.com/ugSUBMARINE/pyprotalign.git
cd pyprotalign
uv venv
uv sync
Quick Start
CLI Tool
# Basic superposition (uses first protein chain from each structure)
uv run protalign fixed.cif mobile.cif -o superposed.cif
# Specify chains to align
uv run protalign fixed.cif mobile.cif --fixed-chain A --mobile-chain B
# Global alignment (align all matching chains: A-A, B-B, etc.)
uv run protalign fixed.cif mobile.cif --global
# Global alignment by chain order (ignore chain IDs)
uv run protalign fixed.cif mobile.cif --global --by-order
# Quaternary alignment (smart chain matching by proximity)
uv run protalign fixed.cif mobile.cif --quaternary --distance-threshold 8.0
# Quaternary alignment with chain renaming
uv run protalign fixed.cif mobile.cif --quaternary --rename-chains
# With iterative refinement (reject outliers)
uv run protalign fixed.cif mobile.cif --refine --cutoff 2.0 --cycles 5
# Filter by B-factor / occupancy (for well-defined regions)
uv run protalign fixed.cif mobile.cif --filter --max-bfac 30 --min-occ 0.9
# Output as PDB
uv run protalign fixed.cif mobile.cif -o superposed.pdb
# Batch alignment: multiple mobile files (outputs <stem>_<suffix>.<ext>)
uv run protalign reference.cif mobile1.cif mobile2.cif mobile3.cif
# Custom output suffix (e.g., <stem>_aligned.cif)
uv run protalign reference.cif *.cif --output aligned
# Batch with quaternary mode (e.g., for AlphaFold/Boltz multi-chain models)
uv run protalign reference.cif *.cif --quaternary --output aligned
All-to-All Chain Alignment
The protalign-all2all CLI aligns every extracted protein chain against all others, across one or more structures.
With multiple input files, chain IDs are prefixed with the file stem (e.g., 9ebk_A).
# Table output to screen (default)
uv run protalign-all2all 9ebk.cif 9jn4.cif
# CSV output to a file (default name: all2all.csv for multiple inputs)
uv run protalign-all2all 9ebk.cif 9jn4.cif --format csv
# Custom CSV filename
uv run protalign-all2all 9ebk.cif 9jn4.cif --format csv --output pairs.csv
# Tighten filters
uv run protalign-all2all 9ebk.cif 9jn4.cif --min-aligned 50 --max-rmsd 5.0
Notes:
- Table output prints to stdout; CSV output always writes to a file.
- Pairs below
--min-alignedor above--max-rmsdare still reported withNaNand astatusin CSV.
Batch mode:
- Activated when multiple mobile files provided
- Outputs
<stem>_<suffix>.<ext>for each mobile file (extension comes from--output, defaults to.cif) - Reports progress per structure with RMSD values
- Continues on errors
Usage
usage: protalign [-h] [--version] [-o OUTPUT] [--fixed-chain FIXED_CHAIN] [--mobile-chain MOBILE_CHAIN]
[--filter] [--min-bfac MIN_BFAC] [--max-bfac MAX_BFAC] [--min-occ MIN_OCC]
[--refine] [--cycles CYCLES] [--cutoff CUTOFF]
[--global] [--by-order]
[--quaternary] [--distance-threshold DISTANCE_THRESHOLD] [--rename-chains]
[--verbose]
fixed mobile [mobile ...]
Protein structure superposition tool
positional arguments:
fixed Fixed structure file (PDB or mmCIF)
mobile Mobile structure file(s) (PDB or mmCIF). If multiple files provided, batch mode is activated.
options:
-h, --help show this help message and exit
--version show program's version number and exit
-o, --output OUTPUT Output file (single mode) or suffix (batch mode) (default: superposed.cif)
--fixed-chain FIXED_CHAIN
Chain ID for fixed structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
--mobile-chain MOBILE_CHAIN
Chain ID for mobile structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
--filter Apply filtering of CA atoms based on B-factors and occupancies
--min-bfac MIN_BFAC Minimum B-factor threshold - filter CA atoms with B-factors < threshold (default: -inf)
--max-bfac MAX_BFAC Maximum B-factor threshold - filter CA atoms with B-factors > threshold (default: inf)
--min-occ MIN_OCC Minimum occupancy threshold - filter CA atoms with occupancies < threshold (default: -inf)
--refine Use iterative refinement to reject outliers
--cycles CYCLES Maximum refinement cycles (default: 5)
--cutoff CUTOFF Outlier rejection cutoff (distance > cutoff * RMSD) (default: 2.0)
--global Align all protein chains by matching chain IDs or their order in the structure and pooling coordinates
--by-order Use the order of the chains in the structure for chain matching
--quaternary Quaternary alignment: match chains by proximity, rename to match fixed
--distance-threshold DISTANCE_THRESHOLD
Distance threshold (Å) for chain matching in quaternary mode (default: 8.0)
--rename-chains Rename mobile chains to match fixed (only with --quaternary)
--verbose Enable verbose output (show refinement cycles, chain matching details)
Output
The tool reports:
- Chain(s) and number of residues (single-chain mode)
- Chains aligned and total pairs (global mode)
- Number of aligned CA atom pairs
- Quality filtering stats (if using
--filter) - Final RMSD in Ångströms
- If using
--refine: number of pairs retained/rejected - Output file path (single mode defaults to
.cifwhen no extension is provided)
Examples
Single-chain alignment:
$ uv run protalign 9jn4.cif 9ebk.cif --refine
=== Single-Chain Alignment ===
## Fixed structure loaded: 9jn4.cif
Using chain 'B' as target for superposition.
## Processing: 9ebk.cif
Fixed: chain B, 213 residues
Mobile: chain B, 219 residues
Aligned 180 CA pairs with RMSD 0.649 Å
Superposed structure written to: superposed.cif
Chain selection:
$ uv run protalign 9jn4.cif 9ebk.cif --fixed-chain A --mobile-chain B
=== Single-Chain Alignment ===
## Fixed structure loaded: 9jn4.cif
Using chain 'A' as target for superposition.
## Processing: 9ebk.cif
Fixed: chain A, 213 residues
Mobile: chain B, 219 residues
Aligned 211 CA pairs with RMSD 1.421 Å
Superposed structure written to: superposed.cif
Global multi-chain alignment:
$ uv run protalign 9jn4.cif 9jn6.cif --global
=== Global Multi-Chain Alignment ===
## Fixed structure loaded: 9jn4.cif
## Processing: 9jn6.cif
Aligned CA pairs across 4 chains: A, B, C, D
Aligned 850 CA pairs with RMSD 33.550 Å
Superposed structure written to: superposed.cif
Global alignment by order (chain IDs differ):
$ uv run protalign 9jn4.cif 9jn6.cif --global --by-order
=== Global Multi-Chain Alignment ===
## Fixed structure loaded: 9jn4.cif
## Processing: 9jn6.cif
Aligned CA pairs across 4 chain pairings (fixed → mobile): B → B, D → C, A → D, C → A
Aligned 850 CA pairs with RMSD 26.013 Å
Superposed structure written to: superposed.cif
Quaternary alignment (chain labels differ):
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary
=== Quaternary Alignment ===
## Fixed structure loaded: 9jn4.cif
Using chain 'B' as seed for quaternary alignment.
## Processing: 9jn6.cif
Chain mapping after seed alignment: B → B, D → C, A → A, C → D
Aligned CA pairs across 4 chain pairings (fixed → mobile): B → B, D → C, A → A, C → D
Aligned 850 CA pairs with RMSD 0.180 Å
Superposed structure written to: superposed.cif
Verbose output (detailed progress):
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary --refine --verbose
=== Quaternary Alignment ===
## Fixed structure loaded: 9jn4.cif
Using chain 'B' as seed for quaternary alignment.
## Processing: 9jn6.cif
-- Seed alignment: B → B --
Sequence alignment:
Fixed: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
Fixed: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
Fixed: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
Fixed: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
|||||||||||||||||||||||||||||||||
Mobile: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
Score: 1119
Indentity: 100.0%
213 CA pairs after sequence alignment.
Refinement cycles:
Cycle 1: 213 pairs, RMSD = 0.110 Å
Cycle 2: 205 pairs, RMSD = 0.101 Å
Cycle 3: 197 pairs, RMSD = 0.093 Å
Cycle 4: 195 pairs, RMSD = 0.092 Å
Converged (no more outliers)
Aligned 195 CA pairs with RMSD 0.092 Å
-- Chain center distances after seed alignment. --
B ↔ B: 0.01 Å ✓
B ↔ C: 17.08 Å ✗
B ↔ D: 40.32 Å ✗
B ↔ A: 49.80 Å ✗
D ↔ B: 17.09 Å ✗
D ↔ C: 0.05 Å ✓
D ↔ D: 33.91 Å ✗
D ↔ A: 40.36 Å ✗
A ↔ B: 49.76 Å ✗
A ↔ C: 40.28 Å ✗
A ↔ D: 17.00 Å ✗
A ↔ A: 0.19 Å ✓
C ↔ B: 40.26 Å ✗
C ↔ C: 33.90 Å ✗
C ↔ D: 0.23 Å ✓
C ↔ A: 17.22 Å ✗
Chain mapping after seed alignment: B → B, D → C, A → A, C → D
-- Chain mapping B → B --
Sequence alignment:
Fixed: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
Fixed: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
Fixed: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
Fixed: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
|||||||||||||||||||||||||||||||||
Mobile: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
Score: 1119
Indentity: 100.0%
213 CA pairs after sequence alignment.
-- Chain mapping D → C --
Sequence alignment:
Fixed: AMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLVG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: AMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLVG
Fixed: ITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGELR
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: ITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGELR
Fixed: KVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKLS
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: KVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKLS
Fixed: QEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
||||||||||||||||||||||||||||||||
Mobile: QEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
Score: 1113
Indentity: 100.0%
212 CA pairs after sequence alignment.
-- Chain mapping A → A --
Sequence alignment:
Fixed: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
Fixed: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
Fixed: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
Fixed: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP-
|||||||||||||||||||||||||||||||||
Mobile: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLPT
Score: 1108
Indentity: 100.0%
213 CA pairs after sequence alignment.
-- Chain mapping C → D --
Sequence alignment:
Fixed: -AMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GAMFVPGPYHAPEDRWLVDLVRGHPLAQLASNGAGGAAPHITHVPIIVDPELDGPVDRLV
Fixed: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: GITLWGHMNRANPHWAALGGAANVVATFAGPNAYVSPAVYRTAPAAPTWNFTSVQVRGEL
Fixed: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mobile: RKVESADDTLATVRATVAALESRFGAGWDMTGSLDYFRRILPGVGAFRLRVAEADGMFKL
Fixed: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
|||||||||||||||||||||||||||||||||
Mobile: SQEQQPAIRRRVRHSFGGAEATRAVAGLMDRLP
Score: 1102
Indentity: 100.0%
212 CA pairs after sequence alignment.
-- Total number of aligned CA pairs across 4 chains: 850 --
Refinement cycles:
Cycle 1: 850 pairs, RMSD = 0.180 Å
Cycle 2: 831 pairs, RMSD = 0.161 Å
Cycle 3: 813 pairs, RMSD = 0.155 Å
Cycle 4: 803 pairs, RMSD = 0.152 Å
Cycle 5: 801 pairs, RMSD = 0.151 Å
Converged (no more outliers)
-- Summary --
Aligned CA pairs across 4 chain pairings (fixed → mobile): B → B, D → C, A → A, C → D
Aligned 801 CA pairs with RMSD 0.151 Å
Superposed structure written to: superposed.cif
Batch alignment (multiple mobile structures):
$ uv run protalign 9jn4.cif 9jn5.cif 9jn6.cif 9ebk.cif --fixed-chain D --mobile-chain A --output aligned
=== Single-Chain Alignment ===
## Fixed structure loaded: 9jn4.cif
Using chain 'D' as target for superposition.
## Processing 1/3: 9jn5.cif
Fixed: chain D, 212 residues
Mobile: chain A, 211 residues
Aligned 211 CA pairs with RMSD 0.142 Å
Superposed structure written to: 9jn5_aligned.cif
## Processing 2/3: 9jn6.cif
Fixed: chain D, 212 residues
Mobile: chain A, 214 residues
Aligned 212 CA pairs with RMSD 0.302 Å
Superposed structure written to: 9jn6_aligned.cif
## Processing 3/3: 9ebk.cif
Fixed: chain D, 212 residues
Mobile: chain A, 219 residues
Aligned 211 CA pairs with RMSD 1.359 Å
Superposed structure written to: 9ebk_aligned.cif
All-to-all chains alignment:
$ uv run protalign-all2all 9jn4.cif
=== All-to-All Chain Alignment ===
Loaded structure: 9jn4.cif
Found 4 protein chain(s) total.
Chain 1 Chain 2 Aligned RMSD (Å) Status
----------------------------------------------------------
A B 213 0.251
A C 212 0.305
A D 212 0.298
B C 212 0.350
B D 212 0.336
C D 212 0.139
Total pairs aligned: 6 (of 6 total)
Algorithm
Single-chain mode (default)
- Load structures: Reads PDB or mmCIF files
- Extract chains: Selects specified chain or first protein chain
- Sequence alignment: Aligns sequences using gemmi's implementation
- Extract CA atoms: Gets Cα coordinates from aligned residues
- Quality filtering (optional): Filters CA atom pairs by B-factor and occupancy
- Superposition: Applies Kabsch algorithm for optimal transformation
- Refinement (optional): Iteratively rejects outliers beyond
cutoff × RMSD - Transform: Applies transformation to entire mobile structure
- Output: Writes superposed structure in requested format
Global mode (--global)
- Load structures: Reads PDB or mmCIF files
- Match chains: Identifies common chain IDs (A-A, B-B, etc.). Use
--by-orderto ignore IDs and match by chain order. - Align per chain: Sequence alignment for each chain pair
- Quality filtering (optional): Filters CA atom pairs per chain by B-factor and occupancy
- Pool coordinates: Combines CA atoms from all matched chains
- Single transformation: Computes one transformation for all pooled coordinates
- Refinement (optional): Iteratively rejects outliers across all chains
- Transform: Applies transformation to entire mobile structure
- Output: Writes superposed structure in requested format
Quaternary mode (--quaternary)
- Load structures: Reads PDB or mmCIF files
- Seed alignment: Aligns specified or first chain pair with optional quality filtering and refinement
- Proximity matching: Transforms mobile copy, matches remaining chains by distance between chain centers
- Pool coordinates: Sequence aligns all matched chain pairs, pools CA atoms
- Quality filtering (optional): Filters CA atom pairs per chain by B-factor and occupancy
- Final transformation: Computes transformation on pooled coords with optional refinement
- Transform: Applies transformation to mobile structure
- Rename (optional with
--rename-chains): Renames mobile chains to match fixed - Output: Writes superposed structure
Development
Setup
uv venv # Create virtual environment
uv sync --group dev # Install with dev dependencies
Testing
uv run pytest # Run all tests
uv run pytest --cov # With coverage report
Code Quality
uv run mypy src tests # Type checking (strict mode)
uv run ruff check . # Linting
uv run ruff format . # Auto-formatting
Dependencies
- numpy (≥1.26): Numerical operations
- gemmi (≥0.7.4): Structure I/O and sequence alignment
Requirements
- Python ≥3.12
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
Acknowledgements
Thanks to the developers of gemmi for their excellent library. Coding was supported by warp.dev and codex.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyprotalign-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pyprotalign-0.3.0-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcc3bdd3110e3eb15f56c3b81d1eff4cd5a01adf6aa2fd78d7177c5321afbc7e
|
|
| MD5 |
27acde1fc68049828ed09613d99d7a16
|
|
| BLAKE2b-256 |
b0998137fbd750723dbaf2c39cbc7338b14fe0d2fcebd13502bec462415697dc
|