Skip to main content

helixerlite: simplified genome annotation with Helixer

Project description

Helixerlite: Simplified Gene Prediction using Helixer and HelixerPost

This is a lightweight "predict-only" version of Helixer and HelixerPost. Helixer is written in Python and contains many utilities for training models that aren't needed for end users who just want to predict genes in a genome. For smaller eukaryotic genomes, a GPU is not necessary for prediction. On average Ascomycete fungal genomes (~30 Mb), helixerlite should take less than 20 minutes to run.

HelixerPost is written in Rust and is in a separate repository, which makes installing a single tool cumbersome. By using maturin and pyO3, we wrap the Rust code into Python and run it as a single command-line tool.

Features

  • Convert FASTA files to HDF5 format for Helixer
  • Run gene prediction using a pre-trained Helixer model
  • Convert predictions to GFF3 format
  • Lightweight and easy to install
  • No GPU required for smaller genomes

Installation

Installation can be done with pip or other tools able to install from PyPI, such as uv:

python -m pip install helixerlite

Usage

Command-line Interface

HelixerLite provides a simple command-line interface:

# Convert FASTA to HDF5
helixerlite fasta2hdf5 -i genome.fasta -o genome.h5

# Run prediction
helixerlite predict -m path/to/model -i genome.h5 -o predictions.h5

# Convert predictions to GFF3
helixerlite preds2gff3 -g genome.h5 -p predictions.h5 -o output.gff3

Python API

You can also use HelixerLite as a Python library:

from helixerlite import fasta2hdf5, preds2gff3
from helixerlite.hybrid_model import HybridModel

# Convert FASTA to HDF5
fasta2hdf5("genome.fasta", "genome.h5")

# Run prediction
model = HybridModel(["--load-model-path", "path/to/model",
                     "--test-data", "genome.h5",
                     "--prediction-output-path", "predictions.h5"])
model.run()

# Convert predictions to GFF3
preds2gff3("genome.h5", "predictions.h5", "output.gff3")

Requirements

  • Python 3.8 or higher
  • TensorFlow 2.10 or higher
  • h5py
  • pyfastx
  • gfftk

Development

Setting up a development environment

# Clone the repository
git clone https://github.com/nextgenusfs/helixerlite.git
cd helixerlite

# Create a conda environment
conda create -n helixerlite python=3.10
conda activate helixerlite

# Install development dependencies
pip install -e ".[dev]"

Running tests

python -m pytest

Citation

Anybody using this repo should cite the original Helixer authors, manuscript, code, etc.

Felix Holst, Anthony Bolger, Christopher Günther, Janina Maß, Sebastian Triesch, Felicitas Kindel, Niklas Kiel, Nima Saadat, Oliver Ebenhöh, Björn Usadel, Rainer Schwacke, Marie Bolger, Andreas P.M. Weber, Alisandra K. Denton. Helixer—de novo Prediction of Primary Eukaryotic Gene Models Combining Deep Learning and a Hidden Markov Model. bioRxiv 2023.02.06.527280; doi: https://doi.org/10.1101/2023.02.06.527280

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helixerlite-25.4.17.tar.gz (108.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

helixerlite-25.4.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

helixerlite-25.4.17-cp310-cp310-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file helixerlite-25.4.17.tar.gz.

File metadata

  • Download URL: helixerlite-25.4.17.tar.gz
  • Upload date:
  • Size: 108.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for helixerlite-25.4.17.tar.gz
Algorithm Hash digest
SHA256 4cb02bac1762c3f190749a4f9aadfad39abd71bd7f4dcaf65bef8b0c0ce8e0d1
MD5 bbf6842e81695f8df3ed2fdbec5f0f19
BLAKE2b-256 5529dc0491e8705f6858e212a57f5e51e102514b46c1a3892410d7090e2e1960

See more details on using hashes here.

Provenance

The following attestation bundles were made for helixerlite-25.4.17.tar.gz:

Publisher: release.yml on nextgenusfs/helixerlite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file helixerlite-25.4.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for helixerlite-25.4.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a2bb78b02c5dce6465695b7158ca434f9a677ddb6fdb19c03369f7ec977f10b1
MD5 a6ec22928b188a33e758b3393795c2f5
BLAKE2b-256 70c8b39d1af9836c12cfb9ac0cdbfbd8042790dd18826fa82294196d6cc74b47

See more details on using hashes here.

Provenance

The following attestation bundles were made for helixerlite-25.4.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on nextgenusfs/helixerlite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file helixerlite-25.4.17-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for helixerlite-25.4.17-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ea4f0cae6dd60d0a924dddca0fb65a1c5c7c2515aa1716118756539d2c67534
MD5 d8201931abb83bff043b5e20aaae5a05
BLAKE2b-256 b2354ec3b73e97c0ee7ca30c037c51c74bab152357a9ad264cd6046961533907

See more details on using hashes here.

Provenance

The following attestation bundles were made for helixerlite-25.4.17-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on nextgenusfs/helixerlite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page