Protein structure superposition package and CLI tool
Project description
pyprotalign
Protein structure superposition using sequence alignment and iterative refinement.
Features
- Sequence-based alignment: Automatically identifies corresponding atoms via sequence alignment
- Kabsch algorithm: Optimal least-squares superposition
- Iterative refinement: Outlier rejection for improved accuracy
- Multi-chain support:
- Single-chain alignment with specified or default chains
- Global alignment of all matching chains
- Quaternary alignment with smart chain matching by proximity
- Batch processing: Align multiple mobile structures to a single reference
Installation
Using uv/pip
uv pip install pyprotalign
From source
git clone https://github.com/ugSUBMARINE/pyprotalign.git
cd pyprotalign
uv venv
uv sync
Quick Start
CLI Tool
# Basic superposition (uses first protein chain from each structure)
uv run protalign fixed.cif mobile.cif -o superposed.cif
# Specify chains to align
uv run protalign fixed.cif mobile.cif --fixed-chain A --mobile-chain B
# Global alignment (align all matching chains: A-A, B-B, etc.)
uv run protalign fixed.cif mobile.cif --global
# Quaternary alignment (smart chain matching by proximity)
uv run protalign fixed.cif mobile.cif --quaternary --distance-threshold 8.0
# Quaternary alignment with chain renaming
uv run protalign fixed.cif mobile.cif --quaternary --rename-chains
# With iterative refinement (reject outliers)
uv run protalign fixed.cif mobile.cif --refine --cutoff 2.0 --cycles 5
# Output as PDB
uv run protalign fixed.cif mobile.cif -o superposed.pdb
# Batch alignment: multiple mobile files (outputs <stem>_superposed.cif)
uv run protalign reference.cif mobile1.cif mobile2.cif mobile3.cif
# Custom output suffix (e.g., <stem>_aligned.cif)
uv run protalign reference.cif *.cif --output aligned
# Batch with quaternary mode (e.g., for AlphaFold/Boltz multi-chain models)
uv run protalign reference.cif *.cif --quaternary --output aligned
Batch mode:
- Activated when multiple mobile files provided
- Outputs
<stem>_<suffix>.ciffor each mobile file - Reports progress and summary with RMSD values
- Continues on errors
Usage
usage: protalign [-h] [--version] [-o OUTPUT] [--fixed-chain FIXED_CHAIN] [--mobile-chain MOBILE_CHAIN] [--refine] [--cycles CYCLES] [--cutoff CUTOFF] [--global] [--quaternary] [--distance-threshold DISTANCE_THRESHOLD] [--rename-chains] [--verbose]
fixed mobile [mobile ...]
Protein structure superposition tool
positional arguments:
fixed Fixed structure file (PDB or mmCIF)
mobile Mobile structure file(s) (PDB or mmCIF). If multiple files provided, batch mode is activated.
options:
-h, --help show this help message and exit
--version show program's version number and exit
-o, --output OUTPUT Output file (single mode) or suffix (batch mode) (default: superposed.cif)
--fixed-chain FIXED_CHAIN
Chain ID for fixed structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
--mobile-chain MOBILE_CHAIN
Chain ID for mobile structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
--refine Use iterative refinement to reject outliers
--cycles CYCLES Maximum refinement cycles (default: 5)
--cutoff CUTOFF Outlier rejection cutoff (distance > cutoff * RMSD) (default: 2.0)
--global Align all protein chains by matching chain IDs (A-A, B-B, etc.) and pooling coordinates
--quaternary Quaternary alignment: match chains by proximity, rename to match fixed
--distance-threshold DISTANCE_THRESHOLD
Distance threshold (Å) for chain matching in quaternary mode (default: 8.0)
--rename-chains Rename mobile chains to match fixed (only with --quaternary)
--verbose Enable verbose output (show refinement cycles, chain matching details)
Output
The tool reports:
- Chain(s) and number of residues (single-chain mode)
- Chains aligned and total pairs (global mode)
- Number of aligned CA atom pairs
- Final RMSD in Ångströms
- If using
--refine: number of pairs retained/rejected
Examples
Single-chain alignment:
$ uv run protalign 9jn4.cif 9ebk.cif --refine
Fixed: chain B, 213 residues
Mobile: chain B, 219 residues
Aligned: 207 CA atom pairs
Refinement: 167 pairs retained, 40 rejected
RMSD: 0.637 Å
Superposed structure written to: superposed.cif
Chain selection:
$ uv run protalign 9jn4.cif 9ebk.cif --fixed-chain A --mobile-chain B
Fixed: chain A, 213 residues
Mobile: chain B, 219 residues
Aligned: 207 CA atom pairs
RMSD: 1.807 Å
Superposed structure written to: superposed.cif
Global multi-chain alignment:
$ uv run protalign 9jn4.cif 9jn6.cif --global
Chains: A, B, C, D
Aligned: 850 CA atom pairs across 4 chains
RMSD: 33.550 Å
Superposed structure written to: superposed.cif
Quaternary alignment (chain labels differ):
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary
Quaternary alignment:
B → B (matched)
D → C (matched)
A → A (matched)
C → D (matched)
Aligned: 850 CA pairs across 4 chain pairs
RMSD: 0.180 Å
Superposed structure written to: superposed.cif
Verbose output (detailed progress):
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary --refine --verbose
=== Quaternary Alignment ===
Seed alignment: B → B
Refinement cycles:
Cycle 1: 213 pairs, RMSD = 0.110 Å
Cycle 2: 205 pairs, RMSD = 0.101 Å
Cycle 3: 197 pairs, RMSD = 0.093 Å
Cycle 4: 195 pairs, RMSD = 0.092 Å
Converged (no more outliers)
Chain center distances after seed alignment:
D ↔ C: 0.05 Å ✓
D ↔ D: 33.91 Å ✗
D ↔ A: 40.36 Å ✗
A ↔ D: 17.00 Å ✗
A ↔ A: 0.19 Å ✓
C ↔ D: 0.23 Å ✓
Quaternary alignment:
B → B (matched)
D → C (matched)
A → A (matched)
C → D (matched)
Aligned: 850 CA pairs across 4 chain pairs
=== Final Refinement ===
Refinement cycles:
Cycle 1: 850 pairs, RMSD = 0.180 Å
Cycle 2: 831 pairs, RMSD = 0.161 Å
Cycle 3: 813 pairs, RMSD = 0.155 Å
Cycle 4: 803 pairs, RMSD = 0.152 Å
Cycle 5: 801 pairs, RMSD = 0.151 Å
Converged (no more outliers)
RMSD: 0.151 Å
Superposed structure written to: superposed.cif
Batch alignment (multiple mobile structures):
$ uv run protalign 9jn4.cif 9jn5.cif 9jn6.cif 9ebk.cif --fixed-chain D --mobile-chain A --output aligned
Processing 1/3: 9jn5.cif
Fixed: chain D, 212 residues
Mobile: chain A, 211 residues
Aligned: 211 CA atom pairs
RMSD: 0.142 Å
Output: 9jn5_aligned.cif
Processing 2/3: 9jn6.cif
Fixed: chain D, 212 residues
Mobile: chain A, 214 residues
Aligned: 212 CA atom pairs
RMSD: 0.302 Å
Output: 9jn6_aligned.cif
Processing 3/3: 9ebk.cif
Fixed: chain D, 212 residues
Mobile: chain A, 219 residues
Aligned: 207 CA atom pairs
RMSD: 1.754 Å
Output: 9ebk_aligned.cif
================================================================================
SUMMARY
================================================================================
Total: 3 | Successful: 3 | Failed: 0
Successful alignments:
9jn5.cif RMSD: 0.142 Å → 9jn5_aligned.cif
9jn6.cif RMSD: 0.302 Å → 9jn6_aligned.cif
9ebk.cif RMSD: 1.754 Å → 9ebk_aligned.cif
Algorithm
Single-chain mode (default)
- Load structures: Reads PDB or mmCIF files
- Extract chains: Selects specified chain or first protein chain
- Sequence alignment: Aligns sequences using gemmi's implementation
- Extract CA atoms: Gets Cα coordinates from aligned residues
- Superposition: Applies Kabsch algorithm for optimal transformation
- Refinement (optional): Iteratively rejects outliers beyond
cutoff × RMSD - Transform: Applies transformation to entire mobile structure
- Output: Writes superposed structure in requested format
Global mode (--global)
- Load structures: Reads PDB or mmCIF files
- Match chains: Identifies common chain IDs (A-A, B-B, etc.)
- Align per chain: Sequence alignment for each chain pair
- Pool coordinates: Combines CA atoms from all matched chains
- Single transformation: Computes one transformation for all pooled coordinates
- Refinement (optional): Iteratively rejects outliers across all chains
- Transform: Applies transformation to entire mobile structure
- Output: Writes superposed structure in requested format
Quaternary mode (--quaternary)
- Load structures: Reads PDB or mmCIF files
- Seed alignment: Aligns specified or first chain pair with optional refinement
- Proximity matching: Transforms mobile copy, matches remaining chains by distance between chain centers
- Pool coordinates: Sequence aligns all matched chain pairs, pools CA atoms
- Final transformation: Computes transformation on pooled coords with optional refinement
- Transform: Applies transformation to mobile structure
- Rename (optional with
--rename-chains): Renames mobile chains to match fixed - Output: Writes superposed structure
Development
Setup
uv venv # Create virtual environment
uv sync --group dev # Install with dev dependencies
Testing
uv run pytest # Run all tests
uv run pytest --cov # With coverage report
Code Quality
uv run mypy src tests # Type checking (strict mode)
uv run ruff check . # Linting
uv run ruff format . # Auto-formatting
Dependencies
- numpy (≥1.26): Numerical operations
- gemmi (≥0.7.4): Structure I/O and sequence alignment
Requirements
- Python ≥3.12
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
Acknowledgements
Thanks to the developers of gemmi for their excellent library. Coding was supported by warp.dev.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyprotalign-0.1.0.tar.gz.
File metadata
- Download URL: pyprotalign-0.1.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf42d06386ce1f0e7a1ed15d5979ec7541638fc2efad1fcee46a5e6048fa44a1
|
|
| MD5 |
f6429a1c0a2ee56c4a07a8dd1b2c62d7
|
|
| BLAKE2b-256 |
13d6c9e23fe56cc555271bb2df568884c7ecacef4dbeca16000be949d4a97c15
|
File details
Details for the file pyprotalign-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyprotalign-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4153e903d09c87fc69bb396c5d2892b4c34ad139429b9d679ce8d0ef29d360a
|
|
| MD5 |
5e817b0db0fb0352f11c6e324ab8d2a7
|
|
| BLAKE2b-256 |
828fbb33d321e62378e47555672ffb88153cd9efd0a5ab57baece2836ae1c80b
|