PyO3 bindings and Python interface to FragGeneScanRs, a gene prediction model for short and error-prone reads.
Project description
🔗🐍⏭️ pyfgs 
PyO3 bindings and Python interface to FragGeneScanRs, a gene prediction model for short and error-prone reads.
🗺️ Overview
🔬 The Biological Edge: The Metagenomic Short-Read Specialist
-
Reads Through Sequencing Errors: Unlike Prodigal and Pyrodigal (which are designed for pristine, assembled contigs),
pyfgsuses a Hidden Markov Model trained specifically on sequencing error profiles (Illumina, 454, Sanger). -
Native Frameshift Correction: If a raw read contains an indel, standard tools instantly break the open reading frame and lose the gene.
pyfgsdetects the error, dynamically corrects the reading frame, and translates the protein seamlessly. -
Granular Indel Tracking: Every predicted gene exposes native Python lists of exactly where insertions and deletions were detected, allowing for rigorous downstream quality control.
⚡️ The Engineering Edge: Bare-Metal Rust in Python
-
GIL-Free Multithreading: The Rust engine completely detaches the Python Global Interpreter Lock (GIL) during model inference. You can throw massive FASTQ files at it and watch it perfectly saturate every physical core on your machine.
-
True Zero-Copy Memory:
pyfgsdoesn't waste time copying Python strings into Rust memory. The Rust backend borrows raw byte slices (&[u8]) directly from the Python interpreter's heap, resulting in a virtually non-existent memory footprint. -
Lazy Byte Evaluation: Bypasses the massive "UTF-8 Tax" of standard bioinformatics wrappers. Translated amino acid sequences and corrected DNA are evaluated lazily—meaning the heavy string math only happens if and when you explicitly request it.
-
No FFI Subprocess Tax: Instead of dumping massive .faa files to your hard drive and parsing them back into Python, the HMM runs purely in memory and yields native Python objects ready for immediate downstream analysis.
🐍 Pythonic Quality of Life
-
0-Based BED Coordinates: Say goodbye to wrestling with 1-based, fully closed GFF3 coordinates.
pyfgsnatively outputs standard 0-based, half-open intervals ([start, end)), allowing you to slice standard sequence arrays immediately. -
Drop-in CLI Replacement: Includes a hyper-fast, multithreaded command-line interface that flawlessly mimics the original FragGeneScan tool but operates at a fraction of the compute time.
🔧 Installing
This project is supported on Python 3.10 and later.
pyfgs can be installed directly from PyPI:
pip install pyfgs
💻 CLI Usage
For API usage, please refer to the documentation.
For CLI usage, type pyfgs --help
usage: pyfgs <seq> [options]
🔗🐍⏭️ PyO3 bindings and Python interface to FragGeneScanRs,
a gene prediction model for short and error-prone reads.
Input options 💽:
seq Sequence file (or '-' for stdin)
-m, --model Sequence error model (default: complete):
- short1: Illumina sequencing reads with about 0.1% error rate
- short5: Illumina sequencing reads with about 0.5% error rate
- short10: Illumina sequencing reads with about 1% error rate
- sanger5: Sanger sequencing reads with about 0.5% error rate
- sanger10: Sanger sequencing reads with about 1% error rate
- pyro5: 454 pyrosequencing reads with about 0.5% error rate
- pyro10: 454 pyrosequencing reads with about 1% error rate
- pyro30: 454 pyrosequencing reads with about 3% error rate
- complete: Complete genomic sequences or short sequence reads without sequencing error
-r, --reads Force FASTQ parsing (default: False)
Output options ⚙️:
-o, --out Output file (default: stdout)
-f, --format Output format (default: faa):
- faa (protein fasta)
- ffn (nucleotide fasta)
- bed (BED6 format)
Other options 🚧:
-t, --threads Number of threads (default: 8)
-v, --version Print version and exit
-h, --help Print help and exit
🔖 Citation
For now, please cite the original FragGeneScanRs paper:
Van der Jeugt, F., Dawyndt, P. & Mesuere, B. FragGeneScanRs: faster gene prediction for short reads. BMC Bioinformatics 23, 198 (2022). https://doi.org/10.1186/s12859-022-04736-5
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the GNU General Public License v3.0.
The FragGeneScanRs code was written by Peter Dawyndt,
Bart Mesuere and
Felix Van der Jeugt and is distributed under the
terms of the GPLv3 as well. See https://github.com/FragGeneScanRs/LICENSE for more information.
This project is in no way affiliated, sponsored, or otherwise endorsed by the original FragGeneScanRs authors Peter Dawyndt, Bart Mesuere and Felix Van der Jeugt. It was developed by Tom Stanton during his Post-doc project at Monash University in the Wryes Lab.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfgs-0.0.1a2.tar.gz.
File metadata
- Download URL: pyfgs-0.0.1a2.tar.gz
- Upload date:
- Size: 88.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
661bcfec06c1db685c251baa909cbac5773100eb763885967ffe0a91756a501e
|
|
| MD5 |
fe474f6bd41da7f57688f6fb4d04d6ec
|
|
| BLAKE2b-256 |
2f8f22c45cefac4c3ea76f6b11a8f643f8fbf2910eee826f27d160ed6f414bf7
|
Provenance
The following attestation bundles were made for pyfgs-0.0.1a2.tar.gz:
Publisher:
publish.yml on tomdstanton/pyfgs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfgs-0.0.1a2.tar.gz -
Subject digest:
661bcfec06c1db685c251baa909cbac5773100eb763885967ffe0a91756a501e - Sigstore transparency entry: 1140588788
- Sigstore integration time:
-
Permalink:
tomdstanton/pyfgs@4f0880e15f26b7dc29f95500dd6875be542e049b -
Branch / Tag:
refs/tags/v0.0.1-alpha.2 - Owner: https://github.com/tomdstanton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4f0880e15f26b7dc29f95500dd6875be542e049b -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.14, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f86a5e05c6e3b2365ea6bcd430d27d6380a20fdfd6a42c8ce429cce7525236ce
|
|
| MD5 |
17dcd85b0541edb5f8ad4357798a84d1
|
|
| BLAKE2b-256 |
ad1beb403c95c78052701951741c4a4ca97c6ef334acc79f00e632c17fa7638a
|
Provenance
The following attestation bundles were made for pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
publish.yml on tomdstanton/pyfgs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfgs-0.0.1a2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
f86a5e05c6e3b2365ea6bcd430d27d6380a20fdfd6a42c8ce429cce7525236ce - Sigstore transparency entry: 1140588884
- Sigstore integration time:
-
Permalink:
tomdstanton/pyfgs@4f0880e15f26b7dc29f95500dd6875be542e049b -
Branch / Tag:
refs/tags/v0.0.1-alpha.2 - Owner: https://github.com/tomdstanton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4f0880e15f26b7dc29f95500dd6875be542e049b -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c78a42bc15d581cf6de56a310e25baddab57874c9526a9072a7fee09b612656
|
|
| MD5 |
1f7b0e88e19de681d472c5bff8d7b25d
|
|
| BLAKE2b-256 |
0ad3aefc2a0019844853dd29ae67dd37de6a72e8cbe83b13175aebe37ec2f334
|
Provenance
The following attestation bundles were made for pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl:
Publisher:
publish.yml on tomdstanton/pyfgs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfgs-0.0.1a2-cp312-cp312-win_amd64.whl -
Subject digest:
9c78a42bc15d581cf6de56a310e25baddab57874c9526a9072a7fee09b612656 - Sigstore transparency entry: 1140589036
- Sigstore integration time:
-
Permalink:
tomdstanton/pyfgs@4f0880e15f26b7dc29f95500dd6875be542e049b -
Branch / Tag:
refs/tags/v0.0.1-alpha.2 - Owner: https://github.com/tomdstanton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4f0880e15f26b7dc29f95500dd6875be542e049b -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbc3371cde83306b428346c5d519b04839de572ac27760d68b5655e11489a4a3
|
|
| MD5 |
3a09cb3862f170e19f3c2501fbb52aae
|
|
| BLAKE2b-256 |
22dda1797f8d599bb90080a4eb2321daf7b44e5a29f465437f71f8ed83f8172d
|
Provenance
The following attestation bundles were made for pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on tomdstanton/pyfgs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfgs-0.0.1a2-cp312-cp312-manylinux_2_34_x86_64.whl -
Subject digest:
bbc3371cde83306b428346c5d519b04839de572ac27760d68b5655e11489a4a3 - Sigstore transparency entry: 1140588974
- Sigstore integration time:
-
Permalink:
tomdstanton/pyfgs@4f0880e15f26b7dc29f95500dd6875be542e049b -
Branch / Tag:
refs/tags/v0.0.1-alpha.2 - Owner: https://github.com/tomdstanton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4f0880e15f26b7dc29f95500dd6875be542e049b -
Trigger Event:
push
-
Statement type: