Skip to main content

Molecular Realm for Spatial Indexed Structures - Fast spatial operations for molecular data

Project description

MolR - Molecular Realm for Spatial Indexed Structures

A high-performance Python package that creates a spatial realm for molecular structures, providing lightning-fast neighbor searches, geometric queries, and spatial operations through integrated KDTree indexing.

Molr

Features

🚀 High-Performance Structure Representation

  • NumPy-based Structure class with Structure of Arrays (SoA)
  • Efficient spatial indexing with scipy KDTree integration for O(log n) neighbor queries
  • Memory-efficient trajectory handling with StructureEnsemble
  • Lazy initialization of optional annotations to minimize memory usage

🔗 Comprehensive Bond Detection

  • Hierarchical bond detection with multiple providers:
    • File-based bonds from PDB CONECT records and mmCIF data
    • Template-based detection using standard residue topologies
    • Chemical Component Dictionary (CCD) lookup for ligands
    • Distance-based detection with Van der Waals radii
  • Intelligent fallback system ensures complete bond coverage
  • Partial processing support for incremental bond detection

🎯 Powerful Selection Language

  • MDAnalysis/VMD-inspired syntax for complex atom queries
  • Spatial selections with within, around, and center-of-geometry queries
  • Boolean operations (and, or, not) for combining selections
  • Residue-based selections with byres modifier

📁 Multi-Format I/O Support

  • PDB format with multi-model support and CONECT record parsing
  • mmCIF format with chemical bond information extraction
  • Auto-detection of single structures vs. trajectories
  • String-based parsing for in-memory structure creation

Installation

pip install molr

For development installation:

git clone https://github.com/abhishektiwari/molr.git
cd molr
pip install -e .[dev]

Quick Start

Basic Structure Loading and Analysis

import molr

# Load structure from PDB file
structure = molr.Structure.from_pdb("protein.pdb")
print(f"Loaded {structure.n_atoms} atoms")

# Detect bonds automatically
bonds = structure.detect_bonds()
print(f"Detected {len(bonds)} bonds")

# Use selection language
ca_atoms = structure.select("name CA")
active_site = structure.select("within 5.0 of (resname HIS)")
protein_backbone = structure.select("protein and backbone")

Spatial Queries and Neighbor Finding

# Fast spatial queries with built-in KDTree indexing
neighbors = structure.get_neighbors_within(atom_idx=100, radius=5.0)
atoms_in_sphere = structure.get_atoms_within_sphere([10, 15, 20], radius=8.0)

# Center of geometry-based selections
ligand = structure.select("resname LIG")
nearby = structure.get_atoms_within_cog_sphere(ligand, radius=10.0)

# Inter-selection contacts
protein = structure.select("protein")
contacts = structure.get_atoms_between_selections(protein, ligand, max_distance=4.0)

Bond Detection and Analysis

from molr import BondDetector

# Configure bond detection
detector = BondDetector(
    enable_residue_templates=True,
    enable_ccd_lookup=True,
    enable_distance_detection=True,
    vdw_factor=0.75
)

# Detect bonds with detailed statistics
bonds, stats = detector.detect_bonds_with_stats(structure)
print(f"Bond detection stats: {stats}")

# Access different bond sources
file_bonds = structure.file_bonds  # From PDB CONECT records
all_bonds = structure.bonds        # Complete bond set

Multi-Model Trajectories

# Load trajectory from multi-model PDB
ensemble = molr.StructureEnsemble.from_pdb("trajectory.pdb")
print(f"Loaded {ensemble.n_models} models with {ensemble.n_atoms} atoms each")

# Access individual frames
first_frame = ensemble[0]  # Returns Structure object
last_frame = ensemble[-1]

# Analyze trajectory
centers = [model.get_center() for model in ensemble]

Advanced Usage

Custom Bond Detection

from molr.bond_detection import (
    ResidueBondProvider,
    CCDBondProvider, 
    DistanceBondProvider
)

# Create custom detection pipeline
providers = [
    ResidueBondProvider(),      # Standard residues first
    CCDBondProvider(),          # CCD lookup for ligands  
    DistanceBondProvider(vdw_factor=0.8)  # Distance fallback
]

# Apply in sequence
final_bonds = BondList()
for provider in providers:
    if provider.is_applicable(structure):
        bonds = provider.detect_bonds(structure, existing_bonds=final_bonds)
        final_bonds.extend(bonds)

Structure Manipulation

# Subset structures
protein_only = structure[structure.select("protein")]
chain_a = structure[structure.chain_id == "A"]

# Coordinate transformations  
structure.translate([10.0, 0.0, 0.0])
structure.center_at_origin()

# Add custom annotations
structure.add_annotation("custom_prop", dtype=np.float32, default_value=1.0)

Performance

  • Spatial indexing: O(log n) neighbor queries vs O(n²) brute force
  • Memory efficient: SoA design minimizes memory overhead
  • Vectorized operations: NumPy-based computations throughout
  • Lazy evaluation: Optional data loaded only when needed

API Reference

Core Classes

  • Structure: Single molecular structure with spatial indexing
  • StructureEnsemble: Multi-model trajectory representation
  • BondList: Efficient bond storage and manipulation
  • BondDetector: Configurable hierarchical bond detection

I/O Parsers

  • PDBParser: PDB format with CONECT record support
  • mmCIFParser: mmCIF format with bond information

Selection System

  • select(): Parse and evaluate selection expressions
  • Spatial expressions: within, around, cog
  • Boolean operators: and, or, not

Requirements

  • Python ≥3.8
  • NumPy ≥1.20.0
  • SciPy ≥1.7.0 (for spatial indexing)
  • pyparsing ≥3.0.0 (for selection language)

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molr-0.0.1.dev2.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molr-0.0.1.dev2-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file molr-0.0.1.dev2.tar.gz.

File metadata

  • Download URL: molr-0.0.1.dev2.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for molr-0.0.1.dev2.tar.gz
Algorithm Hash digest
SHA256 8ba06343f5ff876ad8f6d5874988d14fc2b543e88fa522a4c7941860531d07f5
MD5 c152c9b058e159cba1b817a42d415a8f
BLAKE2b-256 b0e7b6a92f92e662984608a89f0836d996f792671163ce726e7047972627158b

See more details on using hashes here.

Provenance

The following attestation bundles were made for molr-0.0.1.dev2.tar.gz:

Publisher: release.yml on abhishektiwari/molr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file molr-0.0.1.dev2-py3-none-any.whl.

File metadata

  • Download URL: molr-0.0.1.dev2-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for molr-0.0.1.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 e5ace77900c8b587e47dcb28d5a4ac289b88cb8d90dc8d8ca3735250ace32016
MD5 2560a58a857f1e4d2c96ba3e1d59c9ff
BLAKE2b-256 bace3fda7b431a5ddcb5d51f48e6b79b404e77aed03297b8c33e5f13f4e4e49f

See more details on using hashes here.

Provenance

The following attestation bundles were made for molr-0.0.1.dev2-py3-none-any.whl:

Publisher: release.yml on abhishektiwari/molr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page