Skip to main content

C++ bindings for Lahuta

Project description

Lahuta ⚡

Github-CI Github-CI Conda Update Date


Structural Biology has entered a data-rich era.

Lahuta is a high-performance open-source software for scalable structural biology analysis. It creates chemically informative topological descriptors from diverse structural inputs and has been optimized for generative-AI model outputs (such as AlphaFold models). On a modern multi-core laptop, Lahuta processes the AlphaFold DB Swiss-Prot subset (~540,000 proteins) in minutes. In benchmarks, Lahuta is orders of magnitude faster than comparable tools.

Lahuta scales in both dataset and system size, handling hundreds of millions of structures as well as assemblies with tens of millions of atoms, and offers native support for molecular-dynamics simulations.

Key Features

  • Scalability: Processes hundreds of millions of structures and assemblies with tens of millions of atoms
  • Performance: Sub- to low-millisecond computation times
  • Chemistry-aware topology: Detailed topological representation with bond connectivity, bond orders, hybridization states, and protonation states
  • Format support: PDB, PDBx/mmCIF, BinaryCIF, MMTF for structures; XTC and GRO for MD trajectories
  • MD simulation support: Native, first-class support for analyzing molecular dynamics trajectories
  • Optimized for AI models: Custom high-performance topology perception for AlphaFold models
  • Expressive selection language: Query system with logical and arithmetic operators for precise substructure identification
  • Structural analysis: Distance calculations, neighbor searching, native contact analysis
  • Extensive Contact Analysis: Supports contact analysis based on most popular analysis tools (Arpeggio, MolStar, GetContacts)
  • Deep Python Integration: Python is fully supported as a first-class interface and extendability layer
  • No HPC required: Runs efficiently on standard laptops

Design Principles

Lahuta development is guilded by the following three principles:

  1. High Performance: All algorithms and data structures are (reasonably) optimized for speed.
  2. Scalability: Designed to handle ultra large-scale datasets and systems with high efficiency.
  3. Foundational core library: A composable, stable, high‑performance base that enables advanced structural analyses

Motivation

Recent advances in structure prediction have resulted in an explosion of available structural data. The AlphaFold Database contains over 200 million models, and metagenomic predictions from ESMFold add over 600 million more—3-4 orders of magnitude larger than experimental archives. Meanwhile, new generators like BioEmu can synthesize vast conformational ensembles within hours. Existing tools in structural biology was not designed for this volume or heterogeneity, making chemistry-aware, ultra large-scale analyses either infeasible or dependent on prohibitively large compute resources.

Lahuta enables screening of millions of models and long MD ensembles on standard hardware, making it possible to discover novel conformational states, folds and fold families, systematic mapping of interfaces and ligandable pockets, and ensemble-level comparisons that were previously infeasible without dedicated compute resources.

Installation

We recommend using conda for installation. Lahuta requires Python 3.10 or higher.

conda install bisejdiu::lahuta

Or pip:

pip install lahuta

Build, Test, and Install (C++ Core, CLI, and Python bindings)

  • Configure, build, and install from the repository root using Ninja:

    cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DLAHUTA_BUILD_PYTHON=ON -DLAHUTA_GENERATE_PY_STUBS=ON -DBUILD_TESTING=ON -DLAHUTA_BUILD_CLI=ON -DLAHUTA_BUILD_EXAMPLES=ON
    cmake --build build -j 8 && cmake --install build
    
  • Run tests (via CTest):

    cd build && ctest --output-on-failure
    
  • Notes and options:

    • Compiler requirement: GCC 9.1+
    • LAHUTA_BUILD_PYTHON=ON builds the Python shared libraries and installs the Python package into the CMake install prefix. It does not install Python-level dependencies; use pip or conda to install those (see list in interop/python/pyproject.toml).
    • Switch between shared and static linkage for lahuta_core with:
      cmake -S . -B build -DLAHUTA_BUILD_SHARED_CORE=OFF
      
      The default (ON) produces a shared library for reuse by the Python bindings. Set to OFF for a static CLI.
    • Library-only builds (used by Python packaging) can disable the CLI:
      cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLAHUTA_BUILD_CLI=OFF
      
    • Optional test configurations:
      • -DENABLE_ASAN=ON - Enable AddressSanitizer and UndefinedBehaviorSanitizer
      • -DENABLE_TSAN=ON - Enable ThreadSanitizer
    • Run specific tests:
      cd build && ctest -R test_name_pattern --output-on-failure
      

Project Structure Overview

The core of Lahuta consists of these modules (see under core/src/):

  • analysis - Structural analysis routines including contact analysis, system-level properties, and topology perception.
  • bonds - Bond perception algorithms, bond order assignment, and connectivity validation.
  • chemistry - Chemical property calculations (formal charges, hydrophobicity, atom typing routines).
  • compute - Computation abstraction with dependency management, pipeline execution, and result caching.
  • contacts - Contact detection implementations from multiple methods (Arpeggio, MolStar, GetContacts).
  • db - Database interface for fast storage and retrieval of structural data and zero-copy reads.
  • distances - Distance calculations, neighbor search algorithms, and pairwise distance matrices.
  • entities - Core entity representations (atoms, residues, contacts) and their associated views and iterators.
  • entities/search - Entity query and retrieval with hit buffering.
  • md - Molecular dynamics trajectory parsing (XTC, GRO formats) and frame-by-frame analysis support.
  • models - Optimized system-level properties and topology perception for AI (currently AF2) models.
  • pipeline - High-level pipeline framework for processing, parallel execution, backpressure system, and progress tracking.
  • selections - Expressive selection language for querying atoms, residues, and substructures based on geometric, chemical, or topological criteria (WIP)
  • serialization - Data serialization and deserialization.
  • sinks - Data output handlers that write pipeline results to files, databases, or memory.
  • spatial - Spatial indexing structures (cell lists, KD-trees) for scalable neighbor queries and contact searches.
  • topology - Topology construction engine.

The project also includes:

  • cli/ - Command-line interface tools for structure analysis and database creation.
  • interop/python - Python integration
  • core/tests/ - Comprehensive C++ test suite using Google Test framework.
  • core/examples/ - Example programs demonstrating C++ core library usage.

Python integration structure:

  • interop/python/src/ - Python bindings implementation (pybind11) with deep (often zero-copy) NumPy integration.
  • interop/python/lahuta/ - Python package providing high-level APIs, utilities, and type-safe interfaces to core functionality.
  • interop/python/examples/ - Example scripts demonstrating Python API usage.
  • interop/python/tests/ - Python test suite (pytest) covering all public APIs and integration scenarios.
  • interop/python/benchmarks/ - Performance benchmarks comparing Lahuta against other tools.

Documentation

See interop/python/examples and interop/python/tests for Python usage examples.

Reporting Issues

Report issues in the issues section.

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Acknowledgments

Lahuta has benefited from and directly uses several popular open source libraries, including RDKit, molstar, and gemmi. See THIRD-PARTY-NOTICES.md for detailed attribution details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lahuta-2.0.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.3-cp313-cp313-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.13macOS 26.0+ ARM64

lahuta-2.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.3-cp312-cp312-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.12macOS 26.0+ ARM64

lahuta-2.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.3-cp311-cp311-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.11macOS 26.0+ ARM64

lahuta-2.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.3-cp310-cp310-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.10macOS 26.0+ ARM64

File details

Details for the file lahuta-2.0.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9ba93d5c886232e7983a557b0e6b0f640fbe231d639be6d426712f1e680c6cc6
MD5 4d25fb7720f0d25b11ad60d1b5257756
BLAKE2b-256 81117aef655f65f285a604d57014098d6d9e3f9f2372c52532e2f33889b78fac

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp313-cp313-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp313-cp313-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 f86f6dfcd9c2728717e161d76c7e1901a5c38939bc3c29424248a5dfe4d44c7d
MD5 096ce8e3a0e3c31df4df448ad1c22453
BLAKE2b-256 7fc4e71fd524ea0dd8ced1675b7b4d80f16faa72bcad7a82fb41fa28a16accd2

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 34f1c4f6ec0262555ace6dd2b0577fd5cc1892a3d3677e15aa5317bc7b97f01e
MD5 2414072477cc9bd18501236911537941
BLAKE2b-256 b8f938bea82f253c560ac1f8807fd49d24f209f3ff22a2eaffe4895c650ed6d5

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp312-cp312-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp312-cp312-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 0d2fc8d7e9c34cbecfaf04191ecd9b0fa7967f806be454b4376d93f049f13ae9
MD5 fe4e8c9900daa3784a0d5da0406b986c
BLAKE2b-256 2df0dc96cf077b56bdc5b4b7a6419db4ee9b20c1c7aae9953561d79e23bfcda4

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5bec32adaaca5352783d144f33ea7de807f9858c25721eb7794dedd882177d08
MD5 c884a5dbcf3a4ad16dbdb308af915b49
BLAKE2b-256 b731eca4890ace1b41029899906ae4deffb33d9dfffa352ce4405bdf0e635e32

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp311-cp311-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp311-cp311-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 2a371a2b34af9e4f6ee3b00a85840dcff6c53b17e519b9d9e9b0a137c3b8a782
MD5 75eb8b33f857c8a094d5a93897879bca
BLAKE2b-256 5f0e84836342e85e4f484b56404cbec77c67fd617e078d9dc860e60b9485ed64

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3398976c9567188ddc998dd3714a102e48179776b385d0bf6462a8f1efee4ac7
MD5 44d1ce2dfa9abddedb0f56a040222040
BLAKE2b-256 0d3f9d36b882a1dff8357a2e91dec4cc25121a04b6a1673b584b7239dd0d2eb8

See more details on using hashes here.

File details

Details for the file lahuta-2.0.3-cp310-cp310-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.3-cp310-cp310-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 e224ee159c5ea46fb5b08d31926f8beee14c349d29c94e711b4aa8daffb32969
MD5 9c178c6406639021890932a02f100622
BLAKE2b-256 5cb8db90bf6a5f3b1e68fa1ed77ed0256829606343b8f5af7237d3174bdf8ede

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page