Skip to main content

Helical Python SDK

Project description

What is Helical ?

Helical builds the Virtual AI Lab for Biological Discovery. This open framework provides access to state-of-the-art Bio Foundation Models across genomics, transcriptomics, and single-cell data modalities.

Helical simplifies the entire lifecycle of applying Bio Foundation Models — from model access to fine-tuning and in-silico experimentation. With Helical's open-source framework, you can: • Leverage the latest Bio Foundation Models through a simple Python interface • Run example notebooks for key downstream tasks • Customize models and workflows for your own datasets and experiments

This repository is continuously updated with new models, benchmarks, and utilities. Join us in shaping the next generation of AI-powered biology.

Let’s build the most exciting AI-for-Bio community together!

Workflow   Workflow   Docs   PyPI version   GitHub contributors  

What's new?

Cell2Sentence-Scale

We have integrated the new Cell2Sentence-Scale models which use cell sentences as input and are based on the Gemma language model architecture (2B and 27B models available in quantised versions too). You can use this model for embeddings and perturbation prediction. Follow our notebook tutorial here.

New Larger Geneformer Models

We have integrated the new Geneformer models which are larger and have been trained on more data. Find out which models have been integrated into the Geneformer suite in the model card. Check out the our notebook on drug perturbation prediction using different Geneformer scalings here.

TranscriptFormer

We have integrated TranscriptFormer into our helical package and have made a model card for it in our Transcriptformer model folder. If you would like to test the model, take a look at our example notebook!

🧬 Introducing Helix-mRNA-v0: Unlocking new frontiers & use cases in mRNA therapy 🧬

We’re thrilled to announce the release of our first-ever mRNA Bio Foundation Model, designed to:

  1. Be Efficient, handling long sequence lengths effortlessly
  2. Balance Diversity & Specificity, leveraging a 2-step pre-training approach
  3. Deliver High-Resolution, using single nucleotides as a resolution

Check out our blog post to learn more about our approach and read the model card to get started.

Installation

We recommend installing Helical within a conda environment with the commands below (run them in your terminal) - this step is optional:

conda create --name helical-package python=3.11.13
conda activate helical-package

To install the latest pip release of our Helical package, you can run the command below:

pip install helical

To install the latest Helical package, you can run the command below:

pip install --upgrade git+https://github.com/helicalAI/helical.git

Alternatively, clone the repo and install it:

git clone https://github.com/helicalAI/helical.git
pip install .

[Optional] To install mamba-ssm and causal-conv1d use the command below:

pip install helical[mamba-ssm]

or in case you're installing from the Helical repo cloned locally:

pip install .[mamba-ssm]

Notes on the installation:

  • Make sure your machine has GPU(s) and Cuda installed. Currently this is a requirement for the packages mamba-ssm and causal-conv1d.
  • The package causal_conv1d requires torch to be installed already. First installing helical separately (without [mamba-ssm]) will install torch for you. A second installation (with [mamba-ssm]), installs the packages correctly.
  • If you have problems installing mamba-ssm, you can install the package via the provided .whl files on their release page here. Choose the package according to your cuda, torch and python version:
pip install https://github.com/state-spaces/mamba/releases/download/v2.2.4/mamba_ssm-2.2.4+cu12torch2.3cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
  • Now continue with pip install .[mamba-ssm] to also install the remaining causal-conv1d.

Singularity (Optional)

If you desire to run your code in a singularity file, you can use the singularity.def file and build an apptainer with it:

apptainer build --sandbox singularity/helical singularity.def

and then shell into the sandbox container (use the --nv flag if you have a GPU available):

apptainer shell --nv --fakeroot singularity/helical/

RNA models:

DNA models:

Demo & Use Cases

To run examples, be sure to have installed the Helical package (see Installation) and that it is up-to-date.

You can look directly into the example folder above and download the script of your choice, look into our documentation for step-by-step guides or directly clone the repository using:

git clone https://github.com/helicalAI/helical.git

Within the examples/notebooks folder, open the notebook of your choice. We recommend starting with Quick-Start-Tutorial.ipynb

Current Examples:

Example Description Colab
Quick-Start-Tutorial.ipynb A tutorial to quickly get used to the helical package and environment. Open In Colab
Helix-mRNA.ipynb An example of how to use the Helix-mRNA model. Open In Colab
Geneformer-vs-TranscriptFormer.ipynb Zero-Shot Reference Mapping with Geneformer & TranscriptFormer and compare the outcomes. Open In Colab
Hyena-DNA-Inference.ipynb An example how to do probing with HyenaDNA by training a neural network on 18 downstream classification tasks. Open In Colab
Cell-Type-Annotation.ipynb An example how to do probing with scGPT by training a neural network to predict cell type annotations. Open In Colab
Cell-Type-Classification-Fine-Tuning.ipynb An example how to fine-tune different models on classification tasks. Open In Colab
HyenaDNA-Fine-Tuning.ipynb An example of how to fine-tune the HyenaDNA model on downstream benchmarks. Open In Colab
Cell-Gene-Cls-embedding-generation.ipynb A notebook explaining the different embedding modes of single cell RNA models. Open In Colab
Geneformer-Series-Comparison.ipynb A zero shot comparison between Geneformer model scaling on drug perturbation prediction Open In Colab
Cell2Sen-Tutorial.ipynb An example tutorial of how to use cell2sen models for embeddings and perturbation predictions. Open In Colab

Stuck somewhere ? Other ideas ?

We are eager to help you and interact with you:

  • Join our Slack channel where you can discuss applications of bio foundation models.
  • You can also open Github issues here.

Why should I use Helical & what to expect in the future?

If you are (or plan to) working with bio foundation models s.a. Geneformer or UCE on RNA and DNA data, Helical will be your best buddy! We provide and improve on:

  • Up-to-date model library
  • A unified API for all models
  • User-facing abstractions tailored to computational biologists, researchers & AI developers
  • Innovative use case and application examples and ideas
  • Efficient data processing & code-base

We will continuously upload the latest model, publish benchmarks and make our code more efficient.

Contributing

We welcome all kinds of contributions, including code, documentation, bug reports, and feature suggestions. Please read our Contributing Guidelines to help us keep the project organized and collaborative.

Acknowledgements

A lot of our models have been published by talented authors developing these exciting technologies. We sincerely thank the authors of the following open-source projects:

Licenses

You can find the Licenses for each model implementation in the model repositories:

Citation

Please use this BibTeX to cite this repository in your publications:

@software{allard_2024_13135902,
  author       = {Helical Team},
  title        = {helicalAI/helical: v1.1.0},
  month        = nov,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {1.1.0},
  doi          = {10.5281/zenodo.13135902},
  url          = {https://doi.org/10.5281/zenodo.13135902}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helical-1.4.20.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helical-1.4.20-py3-none-any.whl (311.6 kB view details)

Uploaded Python 3

File details

Details for the file helical-1.4.20.tar.gz.

File metadata

  • Download URL: helical-1.4.20.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for helical-1.4.20.tar.gz
Algorithm Hash digest
SHA256 8e21242d4eb0b95628039966f318bce843c8498f3eb76157030e4f6d7e74bb4a
MD5 52716757395dd9e61cc70810c9996e75
BLAKE2b-256 1f4380da1a4c065676749401b481ed6eee6aa393190e718c2aaa121aa4b75903

See more details on using hashes here.

File details

Details for the file helical-1.4.20-py3-none-any.whl.

File metadata

  • Download URL: helical-1.4.20-py3-none-any.whl
  • Upload date:
  • Size: 311.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for helical-1.4.20-py3-none-any.whl
Algorithm Hash digest
SHA256 a437b1c2edd3a96c73c502865fad90a1c8d77600086b951a3a0ecbc7b32a416f
MD5 44eb4595de094cd3d197a6bc0e69c006
BLAKE2b-256 d8ebdb3ac314deeaa8d2ff214b7e1e6d9d5512be8a07ef9043f8b490868d89a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page