Skip to main content

Molecular Realm for Spatial Indexed Structures - Fast spatial operations for molecular data

Project description

MolR - Molecular Realm for Spatial Indexed Structures

A high-performance Python package that creates a spatial realm for molecular structures, providing lightning-fast neighbor searches, geometric queries, and spatial operations through integrated KDTree indexing.

Features

🚀 High-Performance Structure Representation

  • NumPy-based Structure class with Structure of Arrays (SoA)
  • Efficient spatial indexing with scipy KDTree integration for O(log n) neighbor queries
  • Memory-efficient trajectory handling with StructureEnsemble
  • Lazy initialization of optional annotations to minimize memory usage

🔗 Comprehensive Bond Detection

  • Hierarchical bond detection with multiple providers:
    • File-based bonds from PDB CONECT records and mmCIF data
    • Template-based detection using standard residue topologies
    • Chemical Component Dictionary (CCD) lookup for ligands
    • Distance-based detection with Van der Waals radii
  • Intelligent fallback system ensures complete bond coverage
  • Partial processing support for incremental bond detection

🎯 Powerful Selection Language

  • MDAnalysis/VMD-inspired syntax for complex atom queries
  • Spatial selections with within, around, and center-of-geometry queries
  • Boolean operations (and, or, not) for combining selections
  • Residue-based selections with byres modifier

📁 Multi-Format I/O Support

  • PDB format with multi-model support and CONECT record parsing
  • mmCIF format with chemical bond information extraction
  • Auto-detection of single structures vs. trajectories
  • String-based parsing for in-memory structure creation

Installation

pip install molr

For development installation:

git clone https://github.com/abhishektiwari/molr.git
cd molr
pip install -e .[dev]

Quick Start

Basic Structure Loading and Analysis

import molr

# Load structure from PDB file
structure = molr.Structure.from_pdb("protein.pdb")
print(f"Loaded {structure.n_atoms} atoms")

# Detect bonds automatically
bonds = structure.detect_bonds()
print(f"Detected {len(bonds)} bonds")

# Use selection language
ca_atoms = structure.select("name CA")
active_site = structure.select("within 5.0 of (resname HIS)")
protein_backbone = structure.select("protein and backbone")

Spatial Queries and Neighbor Finding

# Fast spatial queries with built-in KDTree indexing
neighbors = structure.get_neighbors_within(atom_idx=100, radius=5.0)
atoms_in_sphere = structure.get_atoms_within_sphere([10, 15, 20], radius=8.0)

# Center of geometry-based selections
ligand = structure.select("resname LIG")
nearby = structure.get_atoms_within_cog_sphere(ligand, radius=10.0)

# Inter-selection contacts
protein = structure.select("protein")
contacts = structure.get_atoms_between_selections(protein, ligand, max_distance=4.0)

Bond Detection and Analysis

from molr import BondDetector

# Configure bond detection
detector = BondDetector(
    enable_residue_templates=True,
    enable_ccd_lookup=True,
    enable_distance_detection=True,
    vdw_factor=0.75
)

# Detect bonds with detailed statistics
bonds, stats = detector.detect_bonds_with_stats(structure)
print(f"Bond detection stats: {stats}")

# Access different bond sources
file_bonds = structure.file_bonds  # From PDB CONECT records
all_bonds = structure.bonds        # Complete bond set

Multi-Model Trajectories

# Load trajectory from multi-model PDB
ensemble = molr.StructureEnsemble.from_pdb("trajectory.pdb")
print(f"Loaded {ensemble.n_models} models with {ensemble.n_atoms} atoms each")

# Access individual frames
first_frame = ensemble[0]  # Returns Structure object
last_frame = ensemble[-1]

# Analyze trajectory
centers = [model.get_center() for model in ensemble]

Advanced Usage

Custom Bond Detection

from molr.bond_detection import (
    ResidueBondProvider,
    CCDBondProvider, 
    DistanceBondProvider
)

# Create custom detection pipeline
providers = [
    ResidueBondProvider(),      # Standard residues first
    CCDBondProvider(),          # CCD lookup for ligands  
    DistanceBondProvider(vdw_factor=0.8)  # Distance fallback
]

# Apply in sequence
final_bonds = BondList()
for provider in providers:
    if provider.is_applicable(structure):
        bonds = provider.detect_bonds(structure, existing_bonds=final_bonds)
        final_bonds.extend(bonds)

Structure Manipulation

# Subset structures
protein_only = structure[structure.select("protein")]
chain_a = structure[structure.chain_id == "A"]

# Coordinate transformations  
structure.translate([10.0, 0.0, 0.0])
structure.center_at_origin()

# Add custom annotations
structure.add_annotation("custom_prop", dtype=np.float32, default_value=1.0)

Performance

  • Spatial indexing: O(log n) neighbor queries vs O(n²) brute force
  • Memory efficient: SoA design minimizes memory overhead
  • Vectorized operations: NumPy-based computations throughout
  • Lazy evaluation: Optional data loaded only when needed

API Reference

Core Classes

  • Structure: Single molecular structure with spatial indexing
  • StructureEnsemble: Multi-model trajectory representation
  • BondList: Efficient bond storage and manipulation
  • BondDetector: Configurable hierarchical bond detection

I/O Parsers

  • PDBParser: PDB format with CONECT record support
  • mmCIFParser: mmCIF format with bond information

Selection System

  • select(): Parse and evaluate selection expressions
  • Spatial expressions: within, around, cog
  • Boolean operators: and, or, not

Requirements

  • Python ≥3.8
  • NumPy ≥1.20.0
  • SciPy ≥1.7.0 (for spatial indexing)
  • pyparsing ≥3.0.0 (for selection language)

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molr-0.0.0.dev1.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molr-0.0.0.dev1-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file molr-0.0.0.dev1.tar.gz.

File metadata

  • Download URL: molr-0.0.0.dev1.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for molr-0.0.0.dev1.tar.gz
Algorithm Hash digest
SHA256 1ed4b8d170f0d12afabfcd7008af2f1876087ac4a054507c838b0e0c4b882034
MD5 b6a91958774ce653f4d1e555f06c4fbd
BLAKE2b-256 9d4a8ee3c8ee3a2d987580f70dd2821ca72b1157969debedeec0e0aa2265283d

See more details on using hashes here.

Provenance

The following attestation bundles were made for molr-0.0.0.dev1.tar.gz:

Publisher: release.yml on abhishektiwari/molr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file molr-0.0.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: molr-0.0.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for molr-0.0.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 78137631740a29982c1904ab84c286bf16fb0d5a086c2aa09412e5f2afb0457c
MD5 3a0f02f23b2a7a4e05638e926416f9a9
BLAKE2b-256 5a5b78f5073e94efffc74ca1973a5e7773673278c5f88e3c191f2de1037f8e2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for molr-0.0.0.dev1-py3-none-any.whl:

Publisher: release.yml on abhishektiwari/molr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page