Tool to align protein structures with differing lengths
Project description
Structure Align
A minimal Python package for structural alignment of protein structures with different lengths.
Overview
Structure Align performs pairwise sequence alignment followed by structural alignment of protein structures that don't have the same number of amino acids. It's designed to be simple, clean, and efficient for structural biology applications.
Key Features
- Sequence-based alignment: Uses BioPython's pairwise aligner to find matching residues
- Flexible selection: Support for different atom selections (CA, backbone, etc.)
- Structured results: Uses Pydantic models for clean, validated data structures
- RMSD calculation: Provides before/after RMSD values
- Position-wise analysis: Calculate per-residue distances after alignment
- Residue ID mapping: Query distances by original residue IDs (e.g., "residue 22")
- Gap handling: Automatically handles sequence alignment gaps
- Visualization: Built-in plotting functionality for distance analysis
- Interactive plots: Rich hover information with Plotly integration
- Chain-aware display: Visual separation and coloring of different protein chains
Installation
# Install dependencies
poetry install
# Or with pip (after building)
pip install structure-align
Quick Start
import MDAnalysis as mda
from structure_align import StructuralAligner
# Load your structures
reference = mda.Universe("reference.pdb")
mobile = mda.Universe("mobile.pdb")
# Initialize aligner
aligner = StructuralAligner()
# Perform alignment
result = aligner.align(reference, mobile, selection="name CA")
# Print results
print(f"RMSD: {result.rmsd_before:.2f} → {result.rmsd_after:.2f} Å")
print(f"Aligned residues: {result.n_aligned_residues}")
# Query specific residue distances
distance = result.get_distance_by_residue(22) # Distance for residue 22
if distance:
print(f"Residue 22 distance: {distance:.2f} Å")
Residue ID Mapping
One of the key features is the ability to query distances by original residue IDs, even after sequence alignment with gaps:
# Get distance for specific residue
distance = result.get_distance_by_residue(22)
# Get all aligned residue pairs
pairs = result.get_aligned_residue_pairs()
for ref_resid, mob_resid, distance in pairs:
print(f"Ref {ref_resid} ↔ Mob {mob_resid}: {distance:.2f} Å")
# Get formatted table of results
df = result.get_residue_info_table() # Returns pandas DataFrame
print(df.head())
# Or get formatted string (backward compatibility)
table_str = result.get_residue_info_table_formatted()
print(table_str)
# Get residue mappings
ref_mapping, mob_mapping = result.get_residue_mapping()
API Reference
StructuralAligner
Main class for performing structural alignments.
Methods
__init__(gap_open=-10.0, gap_extend=-0.5): Initialize with gap penaltiesalign(reference, mobile, selection="name CA"): Perform alignmentcalculate_position_distances(result): Get per-residue distancesplot_distances(result, **kwargs): Create distance plot
AlignmentResult
Complete alignment result with residue mapping capabilities.
Key Methods
get_distance_by_residue(ref_resid): Get distance for specific residue IDget_aligned_residue_pairs(): Get all (ref_resid, mob_resid, distance) tuplesget_residue_mapping(): Get residue ID to position mappingsget_residue_info_table(): Get formatted table of aligned residues
Properties
rmsd_before/rmsd_after: RMSD values before and after alignmentn_aligned_residues: Number of successfully aligned residuesposition_distances: Per-position distances after alignment
Examples
Basic Usage
from structure_align import StructuralAligner
import MDAnalysis as mda
# Load structures
ref = mda.Universe("protein1.pdb")
mob = mda.Universe("protein2.pdb")
# Align using CA atoms
aligner = StructuralAligner()
result = aligner.align(ref, mob)
print(f"RMSD: {result.rmsd_before:.2f} → {result.rmsd_after:.2f} Å")
Query Specific Residues
# Query distance for residue 22
distance = result.get_distance_by_residue(22)
if distance:
print(f"Residue 22: {distance:.2f} Å")
else:
print("Residue 22 not found in alignment")
# Find high-distance residues
pairs = result.get_aligned_residue_pairs()
high_distance = [(r1, r2, d) for r1, r2, d in pairs if d > 3.0]
print(f"Found {len(high_distance)} residues with distance > 3.0 Å")
Analysis and Visualization
# Get comprehensive analysis as DataFrame
df = result.get_residue_info_table()
print(f"Alignment shape: {df.shape}")
print(df.describe())
# Find high-distance residues
high_distance = df[df['distance'] > 3.0]
print(f"Found {len(high_distance)} residues with distance > 3.0 Å")
# Find specific amino acid combinations
cys_pairs = df[(df['ref_aa'] == 'C') & (df['mob_aa'] == 'C')]
print("Cysteine-Cysteine alignments:")
print(cys_pairs[['ref_resid', 'mob_resid', 'distance']])
# Sort by distance
worst_aligned = df.nlargest(10, 'distance')
print("10 worst aligned residues:")
print(worst_aligned)
# Plot distances with residue information
fig = aligner.plot_distances(result, title="Residue Distance Analysis")
fig.show()
# Statistical analysis
distances = result.get_distances_array()
print(f"Mean distance: {distances.mean():.2f} Å")
print(f"Std deviation: {distances.std():.2f} Å")
DataFrame Operations
The get_residue_info_table() method returns a pandas DataFrame with the following columns:
ref_resid: Reference residue IDmob_resid: Mobile residue IDdistance: Distance between aligned residues (Å)ref_aa: Reference amino acid (single letter)mob_aa: Mobile amino acid (single letter)
This enables powerful analysis:
df = result.get_residue_info_table()
# Filter by distance threshold
high_rmsd = df[df['distance'] > 2.0]
# Filter by amino acid type
aromatics = df[df['ref_aa'].isin(['F', 'W', 'Y'])]
# Group by amino acid and get statistics
aa_stats = df.groupby('ref_aa')['distance'].agg(['mean', 'std', 'count'])
# Export to CSV for further analysis
df.to_csv('alignment_results.csv', index=False)
# Merge with other data
# df = df.merge(other_data, on='ref_resid')
Handling Sequence Gaps
The package automatically handles sequence alignment gaps:
- Sequence alignment: Creates optimal alignment with gaps (-)
- Residue mapping: Only aligned residues (no gaps) are used for structural alignment
- ID preservation: Original residue IDs are preserved for querying
- Gap tracking: You can see which residues were aligned vs. skipped
# Example with gaps
# Reference: ACDEFGHIK
# Mobile: A-DEF-HIK
# Result: Only ADEF and HIK positions are structurally aligned
# But you can still query by original residue IDs
Dependencies
- MDAnalysis: Structure handling and analysis
- BioPython: Sequence alignment
- NumPy: Numerical calculations
- Pandas: Data analysis and DataFrame operations
- Matplotlib: Static plotting
- Plotly: Interactive plotting with rich hover information
- Pydantic: Data validation and models
License
MIT License
Interactive Plotting
The package provides both static (matplotlib) and interactive (Plotly) plotting options:
# Static matplotlib plot
fig_static = aligner.plot_distances(result)
fig_static.show()
# Interactive Plotly plot with rich hover information
fig_interactive = aligner.plot_distances_interactive(result)
fig_interactive.show()
# Save interactive plot as HTML
fig_interactive.write_html("interactive_plot.html")
Interactive Features
Rich Hover Tooltips: Each point shows:
- Residue information:
LEU123 (Chain A) - Amino acid conservation
- Precise distance measurements
- Alignment position
Interactive Controls:
- 🔍 Zoom: Click and drag to zoom into regions
- 🖱️ Pan: Shift+drag to navigate
- 📏 Range Slider: Navigate large proteins easily
- 🎨 Legend: Show/hide specific chains
- 💾 Export: HTML format for sharing
Chain Visualization:
- Color-coded chains with boundaries
- Custom color schemes
- Chain-specific statistics
# Custom chain colors
colors = {'A': 'red', 'B': 'blue', 'C': 'green'}
fig = aligner.plot_distances_interactive(result, chain_colors=colors)
# Large plot for detailed analysis
fig = aligner.plot_distances_interactive(result, height=800, width=1400)
Analysis and Visualization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file structurealign-0.1.0.tar.gz.
File metadata
- Download URL: structurealign-0.1.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24292b5fc9cc8b2003d5789bd75218b490f76a95f8adc0c0caeee21f3e7b4293
|
|
| MD5 |
871123e7d809c139b55a4de38cc676ed
|
|
| BLAKE2b-256 |
ba2bd7936074aacf9a78c9cd80d3bb646d9041a8d01a48c0880ce660206d07f8
|
Provenance
The following attestation bundles were made for structurealign-0.1.0.tar.gz:
Publisher:
publish.yml on PyEED/structure-align
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
structurealign-0.1.0.tar.gz -
Subject digest:
24292b5fc9cc8b2003d5789bd75218b490f76a95f8adc0c0caeee21f3e7b4293 - Sigstore transparency entry: 238754907
- Sigstore integration time:
-
Permalink:
PyEED/structure-align@59d1248e5bf5b50173f377d22c63ccaf3796468c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/PyEED
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59d1248e5bf5b50173f377d22c63ccaf3796468c -
Trigger Event:
push
-
Statement type:
File details
Details for the file structurealign-0.1.0-py3-none-any.whl.
File metadata
- Download URL: structurealign-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3c28700dabb260e13f98618c2eb0494d23c4ad03421919568a5dbd118f1f335
|
|
| MD5 |
5d100e19f62bfff8e66e1f198b91937b
|
|
| BLAKE2b-256 |
19aa0b068e61313cf036388089508e8071bf25649fcc3c8aca4faa6e099fe02c
|
Provenance
The following attestation bundles were made for structurealign-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on PyEED/structure-align
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
structurealign-0.1.0-py3-none-any.whl -
Subject digest:
f3c28700dabb260e13f98618c2eb0494d23c4ad03421919568a5dbd118f1f335 - Sigstore transparency entry: 238754908
- Sigstore integration time:
-
Permalink:
PyEED/structure-align@59d1248e5bf5b50173f377d22c63ccaf3796468c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/PyEED
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59d1248e5bf5b50173f377d22c63ccaf3796468c -
Trigger Event:
push
-
Statement type: