Skip to main content

PyO3 bindings and Python interface to FragGeneScanRs, a gene prediction model for short and error-prone reads.

Project description

🔗🐍⏭️ pyfgs Stars

PyO3 bindings and Python interface to FragGeneScanRs, a gene prediction model for short and error-prone reads.

Release License DOI PyPI Wheel Python Versions Python Implementations Source Issues Docs Changelog Downloads

🗺️ Overview

🔬 The Biological Edge: The Metagenomic Short-Read Specialist

  • Reads Through Sequencing Errors: Unlike Prodigal and Pyrodigal (which are designed for pristine, assembled contigs), pyfgs uses a Hidden Markov Model trained specifically on sequencing error profiles (Illumina, 454, Sanger).

  • Native Frameshift Correction: If a raw read contains an indel, standard tools instantly break the open reading frame and lose the gene. pyfgs detects the error, dynamically corrects the reading frame, and translates the protein seamlessly.

  • Granular Indel Tracking: Every predicted gene exposes native Python lists of exactly where insertions and deletions were detected, allowing for rigorous downstream quality control.

⚡️ The Engineering Edge: Bare-Metal Rust in Python

  • GIL-Free Multithreading: The Rust engine completely detaches the Python Global Interpreter Lock (GIL) during model inference. You can throw massive FASTQ files at it and watch it perfectly saturate every physical core on your machine.

  • True Zero-Copy Memory: pyfgs doesn't waste time copying Python strings into Rust memory. The Rust backend borrows raw byte slices (&[u8]) directly from the Python interpreter's heap, resulting in a virtually non-existent memory footprint.

  • Lazy Byte Evaluation: Bypasses the massive "UTF-8 Tax" of standard bioinformatics wrappers. Translated amino acid sequences and corrected DNA are evaluated lazily—meaning the heavy string math only happens if and when you explicitly request it.

  • No FFI Subprocess Tax: Instead of dumping massive .faa files to your hard drive and parsing them back into Python, the HMM runs purely in memory and yields native Python objects ready for immediate downstream analysis.

🐍 Pythonic Quality of Life

  • 0-Based BED Coordinates: Say goodbye to wrestling with 1-based, fully closed GFF3 coordinates. pyfgs natively outputs standard 0-based, half-open intervals ([start, end)), allowing you to slice standard sequence arrays immediately.

  • Drop-in CLI Replacement: Includes a hyper-fast, multithreaded command-line interface that flawlessly mimics the original FragGeneScan tool but operates at a fraction of the compute time.

🔧 Installing

This project is supported on Python 3.10 and later.

pyfgs can be installed directly from PyPI:

pip install pyfgs

💻 CLI Usage

For API usage, please refer to the documentation. For CLI usage, type pyfgs --help

usage: pyfgs <seq> [options]

🔗🐍⏭️	PyO3 bindings and Python interface to FragGeneScanRs,
	a gene prediction model for short and error-prone reads.

Input options 💽:

  seq             Sequence file (or '-' for stdin)
  -m, --model     Sequence error model (default: complete):
                   - short1: Illumina sequencing reads with about 0.1% error rate
                   - short5: Illumina sequencing reads with about 0.5% error rate
                   - short10: Illumina sequencing reads with about 1% error rate
                   - sanger5: Sanger sequencing reads with about 0.5% error rate
                   - sanger10: Sanger sequencing reads with about 1% error rate
                   - pyro5: 454 pyrosequencing reads with about 0.5% error rate
                   - pyro10: 454 pyrosequencing reads with about 1% error rate
                   - pyro30: 454 pyrosequencing reads with about 3% error rate
                   - complete: Complete genomic sequences or short sequence reads without sequencing error
  -r, --reads     Force FASTQ parsing (default: False)

Output options ⚙️:

  -o, --out       Output file (default: stdout)
  -f, --format    Output format (default: faa):
                   - faa (protein fasta)
                   - ffn (nucleotide fasta)
                   - bed (BED6 format)

Other options 🚧:

  -t, --threads   Number of threads (default: 8)
  -v, --version   Print version and exit
  -h, --help      Print help and exit

🔖 Citation

For now, please cite the original FragGeneScanRs paper:

Van der Jeugt, F., Dawyndt, P. & Mesuere, B. FragGeneScanRs: faster gene prediction for short reads. BMC Bioinformatics 23, 198 (2022). https://doi.org/10.1186/s12859-022-04736-5

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the GNU General Public License v3.0. The FragGeneScanRs code was written by Peter Dawyndt, Bart Mesuere and Felix Van der Jeugt and is distributed under the terms of the GPLv3 as well. See https://github.com/FragGeneScanRs/LICENSE for more information.

This project is in no way affiliated, sponsored, or otherwise endorsed by the original FragGeneScanRs authors Peter Dawyndt, Bart Mesuere and Felix Van der Jeugt. It was developed by Tom Stanton during his Post-doc project at Monash University in the Wryes Lab.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfgs-0.0.1a2.tar.gz (88.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (4.1 MB view details)

Uploaded CPython 3.14macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.12Windows x86-64

pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file pyfgs-0.0.1a2.tar.gz.

File metadata

  • Download URL: pyfgs-0.0.1a2.tar.gz
  • Upload date:
  • Size: 88.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyfgs-0.0.1a2.tar.gz
Algorithm Hash digest
SHA256 661bcfec06c1db685c251baa909cbac5773100eb763885967ffe0a91756a501e
MD5 fe474f6bd41da7f57688f6fb4d04d6ec
BLAKE2b-256 2f8f22c45cefac4c3ea76f6b11a8f643f8fbf2910eee826f27d160ed6f414bf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfgs-0.0.1a2.tar.gz:

Publisher: publish.yml on tomdstanton/pyfgs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 f86a5e05c6e3b2365ea6bcd430d27d6380a20fdfd6a42c8ce429cce7525236ce
MD5 17dcd85b0541edb5f8ad4357798a84d1
BLAKE2b-256 ad1beb403c95c78052701951741c4a4ca97c6ef334acc79f00e632c17fa7638a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish.yml on tomdstanton/pyfgs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 9c78a42bc15d581cf6de56a310e25baddab57874c9526a9072a7fee09b612656
MD5 1f7b0e88e19de681d472c5bff8d7b25d
BLAKE2b-256 0ad3aefc2a0019844853dd29ae67dd37de6a72e8cbe83b13175aebe37ec2f334

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on tomdstanton/pyfgs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bbc3371cde83306b428346c5d519b04839de572ac27760d68b5655e11489a4a3
MD5 3a09cb3862f170e19f3c2501fbb52aae
BLAKE2b-256 22dda1797f8d599bb90080a4eb2321daf7b44e5a29f465437f71f8ed83f8172d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on tomdstanton/pyfgs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page