Skip to main content

Add your description here

Project description

🧬 Helix: Modular Components for Bioinformatics Workflows with Modal

PyPI version




Helix provides a set of modular, Lego-like components for constructing bioinformatics workflows. By abstracting away infrastructure complexities, Helix allows researchers to focus on biological problems rather than computational logistics. Built on Modal, it offers efficient cloud-based execution for large-scale computational tasks. Leveraging Modal's features, Helix provides flexible environments, seamless integrations with various services, efficient data management, advanced job scheduling, and built-in debugging tools, all while enabling easy deployment of web services.

🧩 Core Philosophy

Helix provides modular components for building scalable bioinformatics pipelines. We've abstracted away infrastructure complexities, allowing researchers to construct workflows using a clean Python API. By leveraging Modal's cloud capabilities, Helix offers powerful distributed computing without the typical overhead. Our design emphasizes programmatic interfaces over CLIs, enabling seamless integration into existing codebases. The goal is to empower bioinformaticians to focus on algorithm development and data analysis, rather than resource management and deployment logistics.

⚙️ Getting Started

  1. Create an account at modal.com for cloud execution access.
  2. Install Helix: pip install helixbio (Python 3.10+ required)
  3. Set up Modal: modal token new

🧬 Examples

Here are some examples of how to run various functions using the Modal app context:

Compute Protein Embeddings with ESM

from helix.core import app
from helix.functions.embedding import get_protein_embeddings


with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]
    result = get_protein_embeddings.remote(
        sequences,
        model_name="facebook/esm2_t33_650M_UR50D",
        embedding_strategy="cls"
    )

Predict Protein Structures with Chai and ESMFold

from helix.core import app
from helix.functions import chai, esmfold

# Example for Chai
with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]

    inference_params = {
        "num_recycles": 3,
        "num_samples": 1
    }
    chai_results = chai.predict_structures.remote(sequences, inference_params)
    print(f"Chai predicted {len(chai_results)} structures")

# Example for ESMFold
with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]
    esmfold_results = esmfold.predict_structures.remote(sequences, batch_size=2)
    print(f"ESMFold predicted {len(esmfold_results)} structures")

Score Mutation using Protein Language Models

Mutation scoring uses pre-trained language models to evaluate the impact of amino acid substitutions in protein sequences. This implementation is based on the methods developed by Brian Hie and colleagues, as described in their Nature Biotechnology paper (Hie et al., 2024). The function supports different scoring methods:

  1. "wildtype_marginal": Computes the difference in log probability between the mutant and wild-type amino acids without masking.
  2. "masked_marginal": Masks each position before scoring.
  3. "pppl": (Pseudo-perplexity) Calculates the change in model perplexity caused by the mutation.

These methods have been shown to be effective in guiding the evolution of human antibodies and other proteins.

Here's an example of how to use the mutation scoring function:

from helix.core import app
from helix.functions.scoring.protein import score_mutations

with app.run():
    # Define the sequence and mutations
    sequence = "MALLWMRLLPLLALLALWGPD"
    mutations = ["M1A", "L2A", "W5A"]

    # Define model and metric
    model_name = "facebook/esm2_t33_650M_UR50D"
    metric = "wildtype_marginal"

    # Score mutations
    scores = score_mutations.remote(
        model_name=model_name,
        sequence=sequence,
        mutations=mutations,
        metric=metric
    )
    print(f"Scores for {model_name} using {metric}:")
    print(scores)

Reference: Hie, B.L., Shanker, V.R., Xu, D. et al. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol 42, 275–283 (2024). https://doi.org/10.1038/s41587-023-01763-2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helixbio-0.3.0.tar.gz (55.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helixbio-0.3.0-py3-none-any.whl (64.0 kB view details)

Uploaded Python 3

File details

Details for the file helixbio-0.3.0.tar.gz.

File metadata

  • Download URL: helixbio-0.3.0.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for helixbio-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8365231fbd108c0285f0221c5a01d2ca046aaabadbae012b9a0f4178b9ad940d
MD5 f4c7d440b49dd71aeac259919826b934
BLAKE2b-256 bd800ae1f62dc1de35090f8a5b0c71af39d5f61f88760e414350a4387f2fc415

See more details on using hashes here.

File details

Details for the file helixbio-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: helixbio-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 64.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for helixbio-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe107b908f7e5a961a79a7f3386e460285a00767f79a5c7707cc354075530b68
MD5 35d0f6c0d6917b7e21216f66fee40a28
BLAKE2b-256 532b4c0ea1997999270b69a78e1d11af62d66ff765dae79192694c25b0cfe8ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page