helixbio

Add your description here

Project description

🧬 Helix: Modular Components for Bioinformatics Workflows with Modal

Helix provides a set of modular, Lego-like components for constructing bioinformatics workflows. By abstracting away infrastructure complexities, Helix allows researchers to focus on biological problems rather than computational logistics. Built on Modal, it offers efficient cloud-based execution for large-scale computational tasks. Leveraging Modal's features, Helix provides flexible environments, seamless integrations with various services, efficient data management, advanced job scheduling, and built-in debugging tools, all while enabling easy deployment of web services.

🧩 Core Philosophy

Helix provides modular components for building scalable bioinformatics pipelines. We've abstracted away infrastructure complexities, allowing researchers to construct workflows using a clean Python API. By leveraging Modal's cloud capabilities, Helix offers powerful distributed computing without the typical overhead. Our design emphasizes programmatic interfaces over CLIs, enabling seamless integration into existing codebases. The goal is to empower bioinformaticians to focus on algorithm development and data analysis, rather than resource management and deployment logistics.

⚙️ Getting Started

Create an account at modal.com for cloud execution access.
Install Helix: pip install helixbio (Python 3.10+ required)
Set up Modal: modal token new

🧬 Examples

Here are some examples of how to run various functions using the Modal app context:

Compute Protein Embeddings with ESM

from helix.core import app
from helix.functions.embedding import get_protein_embeddings


with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]
    result = get_protein_embeddings.remote(
        sequences,
        model_name="facebook/esm2_t33_650M_UR50D",
        embedding_strategy="cls"
    )

Predict Protein Structures with Chai and ESMFold

from helix.core import app
from helix.functions import chai, esmfold

# Example for Chai
with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]

    inference_params = {
        "num_recycles": 3,
        "num_samples": 1
    }
    chai_results = chai.predict_structures.remote(sequences, inference_params)
    print(f"Chai predicted {len(chai_results)} structures")

# Example for ESMFold
with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]
    esmfold_results = esmfold.predict_structures.remote(sequences, batch_size=2)
    print(f"ESMFold predicted {len(esmfold_results)} structures")

Score Mutation using Protein Language Models

Mutation scoring uses pre-trained language models to evaluate the impact of amino acid substitutions in protein sequences. This implementation is based on the methods developed by Brian Hie and colleagues, as described in their Nature Biotechnology paper (Hie et al., 2024). The function supports different scoring methods:

"wildtype_marginal": Computes the difference in log probability between the mutant and wild-type amino acids without masking.
"masked_marginal": Masks each position before scoring.
"pppl": (Pseudo-perplexity) Calculates the change in model perplexity caused by the mutation.

These methods have been shown to be effective in guiding the evolution of human antibodies and other proteins.

Here's an example of how to use the mutation scoring function:

from helix.core import app
from helix.functions.scoring.protein import score_mutations

with app.run():
    # Define the sequence and mutations
    sequence = "MALLWMRLLPLLALLALWGPD"
    mutations = ["M1A", "L2A", "W5A"]

    # Define model and metric
    model_name = "facebook/esm2_t33_650M_UR50D"
    metric = "wildtype_marginal"

    # Score mutations
    scores = score_mutations.remote(
        model_name=model_name,
        sequence=sequence,
        mutations=mutations,
        metric=metric
    )
    print(f"Scores for {model_name} using {metric}:")
    print(scores)

Reference: Hie, B.L., Shanker, V.R., Xu, D. et al. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol 42, 275–283 (2024). https://doi.org/10.1038/s41587-023-01763-2

Project details

Release history Release notifications | RSS feed

This version

0.3.0

Nov 2, 2024

0.2.0

Apr 12, 2024

0.1.8

Jan 10, 2024

0.1.7

Jan 5, 2024

0.1.6

Dec 22, 2023

0.1.5

Oct 19, 2023

0.1.4

Oct 18, 2023

0.1.3

Oct 18, 2023

0.1.2

Oct 18, 2023

0.1.1

Oct 9, 2023

0.1.0

Sep 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helixbio-0.3.0.tar.gz (55.5 kB view details)

Uploaded Nov 2, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

helixbio-0.3.0-py3-none-any.whl (64.0 kB view details)

Uploaded Nov 2, 2024 Python 3

File details

Details for the file helixbio-0.3.0.tar.gz.

File metadata

Download URL: helixbio-0.3.0.tar.gz
Upload date: Nov 2, 2024
Size: 55.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.29

File hashes

Hashes for helixbio-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`8365231fbd108c0285f0221c5a01d2ca046aaabadbae012b9a0f4178b9ad940d`
MD5	`f4c7d440b49dd71aeac259919826b934`
BLAKE2b-256	`bd800ae1f62dc1de35090f8a5b0c71af39d5f61f88760e414350a4387f2fc415`

See more details on using hashes here.

File details

Details for the file helixbio-0.3.0-py3-none-any.whl.

File metadata

Download URL: helixbio-0.3.0-py3-none-any.whl
Upload date: Nov 2, 2024
Size: 64.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.29

File hashes

Hashes for helixbio-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe107b908f7e5a961a79a7f3386e460285a00767f79a5c7707cc354075530b68`
MD5	`35d0f6c0d6917b7e21216f66fee40a28`
BLAKE2b-256	`532b4c0ea1997999270b69a78e1d11af62d66ff765dae79192694c25b0cfe8ba`

See more details on using hashes here.

helixbio 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🧬 Helix: Modular Components for Bioinformatics Workflows with Modal

🧩 Core Philosophy

⚙️ Getting Started

🧬 Examples

Compute Protein Embeddings with ESM

Predict Protein Structures with Chai and ESMFold

Score Mutation using Protein Language Models

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes