Skip to main content

C++ bindings for Lahuta

Project description

Lahuta ⚡

Github-CI Github-CI Conda Update Date


Structural Biology has entered a data-rich era.

Lahuta is a high-performance open-source software for scalable structural biology analysis. It creates chemically informative topological descriptors from diverse structural inputs and has been optimized for generative-AI model outputs (such as AlphaFold models). On a modern multi-core laptop, Lahuta processes the AlphaFold DB Swiss-Prot subset (~540,000 proteins) in minutes. In benchmarks, Lahuta is orders of magnitude faster than comparable tools.

Lahuta scales in both dataset and system size, handling hundreds of millions of structures as well as assemblies with tens of millions of atoms, and offers native support for molecular-dynamics simulations.

Key Features

  • Scalability: Processes hundreds of millions of structures and assemblies with tens of millions of atoms
  • Performance: Sub- to low-millisecond computation times
  • Chemistry-aware topology: Detailed topological representation with bond connectivity, bond orders, hybridization states, and protonation states
  • Format support: PDB, PDBx/mmCIF, BinaryCIF, MMTF for structures; XTC and GRO for MD trajectories
  • MD simulation support: Native, first-class support for analyzing molecular dynamics trajectories
  • Optimized for AI models: Custom high-performance topology perception for AlphaFold models
  • Expressive selection language: Query system with logical and arithmetic operators for precise substructure identification
  • Structural analysis: Distance calculations, neighbor searching, native contact analysis
  • Extensive Contact Analysis: Supports contact analysis based on most popular analysis tools (Arpeggio, MolStar, GetContacts)
  • Deep Python Integration: Python is fully supported as a first-class interface and extendability layer
  • No HPC required: Runs efficiently on standard laptops

Design Principles

Lahuta development is guilded by the following three principles:

  1. High Performance: All algorithms and data structures are (reasonably) optimized for speed.
  2. Scalability: Designed to handle ultra large-scale datasets and systems with high efficiency.
  3. Foundational core library: A composable, stable, high‑performance base that enables advanced structural analyses

Motivation

Recent advances in structure prediction have resulted in an explosion of available structural data. The AlphaFold Database contains over 200 million models, and metagenomic predictions from ESMFold add over 600 million more—3-4 orders of magnitude larger than experimental archives. Meanwhile, new generators like BioEmu can synthesize vast conformational ensembles within hours. Existing tools in structural biology was not designed for this volume or heterogeneity, making chemistry-aware, ultra large-scale analyses either infeasible or dependent on prohibitively large compute resources.

Lahuta enables screening of millions of models and long MD ensembles on standard hardware, making it possible to discover novel conformational states, folds and fold families, systematic mapping of interfaces and ligandable pockets, and ensemble-level comparisons that were previously infeasible without dedicated compute resources.

Installation

We recommend using conda for installation. Lahuta requires Python 3.10 or higher. Note that currently this will only work for MacOS systems. For Linux you currently need to build from source. We will update this as soon as can!

conda install bisejdiu::lahuta

Or pip:

pip install lahuta

Build, Test, and Install (C++ Core, CLI, and Python bindings)

  • Configure, build, and install from the repository root using Ninja:

    cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DLAHUTA_BUILD_PYTHON=ON -DLAHUTA_GENERATE_PY_STUBS=ON -DBUILD_TESTING=ON -DLAHUTA_BUILD_CLI=ON -DLAHUTA_BUILD_EXAMPLES=ON
    cmake --build build -j 8 && cmake --install build
    
  • Run tests (via CTest):

    cd build && ctest --output-on-failure
    
  • Notes and options:

    • Compiler requirement: GCC 9.1+
    • LAHUTA_BUILD_PYTHON=ON builds the Python shared libraries and installs the Python package into the CMake install prefix. It does not install Python-level dependencies; use pip or conda to install those (see list in interop/python/pyproject.toml).
    • Switch between shared and static linkage for lahuta_core with:
      cmake -S . -B build -DLAHUTA_BUILD_SHARED_CORE=OFF
      
      The default (ON) produces a shared library for reuse by the Python bindings. Set to OFF for a static CLI.
    • Library-only builds (used by Python packaging) can disable the CLI:
      cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLAHUTA_BUILD_CLI=OFF
      
    • Optional test configurations:
      • -DENABLE_ASAN=ON - Enable AddressSanitizer and UndefinedBehaviorSanitizer
      • -DENABLE_TSAN=ON - Enable ThreadSanitizer
    • Run specific tests:
      cd build && ctest -R test_name_pattern --output-on-failure
      

Project Structure Overview

The core of Lahuta consists of these modules (see under core/src/):

  • analysis - Structural analysis routines including contact analysis, system-level properties, and topology perception.
  • bonds - Bond perception algorithms, bond order assignment, and connectivity validation.
  • chemistry - Chemical property calculations (formal charges, hydrophobicity, atom typing routines).
  • compute - Computation abstraction with dependency management, pipeline execution, and result caching.
  • contacts - Contact detection implementations from multiple methods (Arpeggio, MolStar, GetContacts).
  • db - Database interface for fast storage and retrieval of structural data and zero-copy reads.
  • distances - Distance calculations, neighbor search algorithms, and pairwise distance matrices.
  • entities - Core entity representations (atoms, residues, contacts) and their associated views and iterators.
  • entities/search - Entity query and retrieval with hit buffering.
  • md - Molecular dynamics trajectory parsing (XTC, GRO formats) and frame-by-frame analysis support.
  • models - Optimized system-level properties and topology perception for AI (currently AF2) models.
  • pipeline - High-level pipeline framework for processing, parallel execution, backpressure system, and progress tracking.
  • selections - Expressive selection language for querying atoms, residues, and substructures based on geometric, chemical, or topological criteria (WIP)
  • serialization - Data serialization and deserialization.
  • sinks - Data output handlers that write pipeline results to files, databases, or memory.
  • spatial - Spatial indexing structures (cell lists, KD-trees) for scalable neighbor queries and contact searches.
  • topology - Topology construction engine.

The project also includes:

  • cli/ - Command-line interface tools for structure analysis and database creation.
  • interop/python - Python integration
  • core/tests/ - Comprehensive C++ test suite using Google Test framework.
  • core/examples/ - Example programs demonstrating C++ core library usage.

Python integration structure:

  • interop/python/src/ - Python bindings implementation (pybind11) with deep (often zero-copy) NumPy integration.
  • interop/python/lahuta/ - Python package providing high-level APIs, utilities, and type-safe interfaces to core functionality.
  • interop/python/examples/ - Example scripts demonstrating Python API usage.
  • interop/python/tests/ - Python test suite (pytest) covering all public APIs and integration scenarios.
  • interop/python/benchmarks/ - Performance benchmarks comparing Lahuta against other tools.

Documentation

See interop/python/examples and interop/python/tests for Python usage examples.

Reporting Issues

Report issues in the issues section.

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Acknowledgments

Lahuta has benefited from and directly uses several popular open source libraries, including RDKit, molstar, and gemmi. See THIRD-PARTY-NOTICES.md for detailed attribution details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lahuta-2.0.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.2-cp313-cp313-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.13macOS 26.0+ ARM64

lahuta-2.0.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.2-cp312-cp312-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.12macOS 26.0+ ARM64

lahuta-2.0.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.2-cp311-cp311-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.11macOS 26.0+ ARM64

lahuta-2.0.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lahuta-2.0.2-cp310-cp310-macosx_26_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.10macOS 26.0+ ARM64

File details

Details for the file lahuta-2.0.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9f05cfb10175cd3136e4f837e27ad7e6ee767a5ca1c6084e69ba95043b66401b
MD5 1e8f1ca9c46d6124a43b390f205e77ac
BLAKE2b-256 97236a2245712d2958880f882fc62c5b1a35d5ce864117408bb29bb1aa3f069e

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp313-cp313-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp313-cp313-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 b13c822d407bf45098d0e9747ca0aa473fee86bc8ecea7410dc9a1cdb481bfe4
MD5 6296f6e470102967ebafedc374a21f17
BLAKE2b-256 9522a8401b2a4e794ac6226817fc06e946f5517dd190570e67261cb3c6a281da

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1b2f49b8cf2e09613a67b21f688227ffa9fa968d93218dc5afb76249831f0bbd
MD5 4fcdbeba8a491f3738769e385d41590b
BLAKE2b-256 59ae0791592e4a073b110f4003f83fd0fef77b9475d9f468d97c2567f53080ec

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp312-cp312-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp312-cp312-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 a6e83d13eb4bcacc6a2873e3d490a3cb156d4495bc3ac19ee53cadbe684b76d1
MD5 1cb3ab0bbd41270550b514ca5d1fdd98
BLAKE2b-256 478ee190de891b2d9ed7fe4b13beadcfb0865f1564c6108066a212eacf0b2092

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5fbf75c3a38823f2ef1c84ede90012ecb7baca9ae06d48404bade2ba1e634a5
MD5 2e495e47d5ad95803a1eb0fd9d9b8a72
BLAKE2b-256 8407984f5c9a4251feab71dccb0a65a3bc9e635420b31db5f1f2362260f0d173

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp311-cp311-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp311-cp311-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 17559f44a89164b592ee600c47e65cd71e2e4c87babde607eaa2331b04b4bb27
MD5 a6cea6a1aae7a1e93cafa2374feea670
BLAKE2b-256 e94c339c8a089b7f0a5d00c57eeeff44586ef3e278376206a592739056add7a1

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9194eb81ddb798e3b8acae53337587ebaaaacacead3620616374236e2878792c
MD5 24c177252c59802d9ad5aeb1e6003328
BLAKE2b-256 e5a86c8ce3ca54fade8d65bfc8f89b2ed4bc6ad22e0d5ab2ac533c66f582b387

See more details on using hashes here.

File details

Details for the file lahuta-2.0.2-cp310-cp310-macosx_26_0_arm64.whl.

File metadata

File hashes

Hashes for lahuta-2.0.2-cp310-cp310-macosx_26_0_arm64.whl
Algorithm Hash digest
SHA256 938c1f673eb721b5627aa5100c296118160bfe62d3b073860a807fbd009f6550
MD5 00526ce97397c416d25f01df07e48493
BLAKE2b-256 36c2b39b4d1e6015cc5b1ccecbebdbf810f5491c8dfc3ff0fe5cfd3261cf1d3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page