Gene Annotation Across Diverse Fungal Species Using Deep Learning

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

jxu10 lisavader

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

geneML

geneML is a deep learning–based tool for fungal gene prediction.

Installation

The only requirement is python v3.9 or higher.

Using virtualenv

Start with a fresh python virtual environment:

python -m venv geneml
. geneml/bin/activate
# Now install the latest release from PyPI:
pip install geneml

Using conda

Or use a conda environment:

conda create -n geneml -c conda-forge python=3.13 pip
conda activate geneml
# Now install the latest release from PyPI:
pip install geneml

Directly from the repo

Or install directly from this repo (to get access to the latest changes):

git clone https://github.com/hexagonbio/geneML.git
pip install geneML

Frequent options

Basic command:

geneml genome.fasta

To enable verbose mode:

geneml genome.fasta -v

To change the output path:

geneml genome.fasta -o genome_output.gff3

To run only on selected contigs:

geneml genome.fasta --contigs-filter NC_092406.1,NC_092407.1

To write nucleotide and protein sequences of the predicted genes (one sequence per transcript):

geneml genome.fasta -g genes.fna
geneml genome.fasta -p proteins.faa

By default, geneML outputs multiple transcripts per locus (if there are multiple high scoring options).
You can change the maximum number of transcripts produced, for example forcing to output only the best transcript:

geneml genome.fasta --max-transcripts 1

With enough input data, GeneML dynamically determines the minimum score threshold for reporting genes and transcripts.
You can override this threshold manually, for example:

geneml genome.fasta --min-gene-score 0.5

Output

geneML writes gene annotations in GFF3 format.

Fields

For each predicted gene, transcript, exon and CDS feature, the GFF3 includes:

contig_name  source  feature_type  start  end  feature_score  strand  phase  identifiers

Note: As geneML does not include untranslated regions in its predictions, CDS features are identical to exon features (except for the added phase attribute).
For more information on the GFF3 format, see: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

Identifiers

Each feature has a unique ID and, for child features, a Parent attribute specifying the parent ID.

Feature	Example ID
Gene	GML000001
Transcript	GML000001_mRNA1
Exon	GML000001_mRNA1_exon1
CDS	GML000001_mRNA1_CDS1

Scores

The feature score ranges between 0 and 1 and is a measure of how well the prediction aligns with the raw probabilities outputted by the geneML CNN.
A higher score indicates a higher prediction confidence. The directive ##geneml-mean-gene-score (found at the top of the GFF file) stores the average score of predicted genes.

Full Usage

geneml --help
usage: geneml [-h] [--version] [-o OUTPUT] [-g GENES] [-p PROTEINS] [--gene-id-prefix GENE_ID_PREFIX] [-m MODEL] [-cl CONTEXT_LENGTH] [-c CORES] [-v] [-d]
              [--cpu-only] [--strand {forward,reverse,both}] [--contigs-filter CONTIGS_FILTER] [--write-raw-scores]
              [--max-transcripts MAX_TRANSCRIPTS] [--allow-opposite-strand-overlaps {true,false}] [--min-gene-score MIN_GENE_SCORE]
              [--min-exon-size MIN_EXON_SIZE] [--max-exon-size MAX_EXON_SIZE] [--min-intron-size MIN_INTRON_SIZE]
              [--max-intron-size MAX_INTRON_SIZE] [--cds-start-min-score CDS_START_MIN_SCORE]
              [--cds-end-min-score CDS_END_MIN_SCORE] [--exon-start-min-score EXON_START_MIN_SCORE]
              [--exon-end-min-score EXON_END_MIN_SCORE] [--gene-candidates GENE_CANDIDATES]
              sequence

geneML 1.0.0

positional arguments:
  sequence              Sequence file in FASTA/GenBank/EMBL format.

options:
  -h, --help            Show this help message and exit.
  --version             Show version number and exit.
  -o OUTPUT, --output OUTPUT
                        Gene annotations output path (default: based on input filename).
  -g GENES, --genes GENES
                        Gene sequences output path (default: None).
  -p PROTEINS, --proteins PROTEINS
                        Protein sequences output path (default: None).
  --gene-id-prefix GENE_ID_PREFIX
                        Prefix for gene IDs in output (default: None).
  -m MODEL, --model MODEL
                        Path to model file (default: models/geneML_default.keras).
  -cl CONTEXT_LENGTH, --context-length CONTEXT_LENGTH
                        Context length of the model.
  -c CORES, --cores CORES
                        Number of cores to use for processing (default: all available).

advanced options:
  -v, --verbose         Enable verbose mode.
  -d, --debug           Enable debug mode.
  --cpu-only            Use CPU only for inference, disable GPU usage.
  --strand {forward,reverse,both}
                        On which strand to predict genes (default: both).
  --contigs-filter CONTIGS_FILTER
                        Run only on selected contigs (comma separated string).
  --write-raw-scores    Instead of running gene calling, output the raw model scores as a .seg file.
  --max-transcripts MAX_TRANSCRIPTS
                        Maximum number of transcripts per gene (default: 5).
  --allow-opposite-strand-overlaps {true,false}
                        Predict overlapping genes on opposite strands (default: true).
  --min-gene-score MIN_GENE_SCORE
                        Minimum gene score for gene reporting. Can be a float value or 'dynamic' (default: dynamic). Dynamic mode
                        requires >=100,000 bp total input.
  --min-exon-size MIN_EXON_SIZE
                        Minimum exon size (default: 1).
  --max-exon-size MAX_EXON_SIZE
                        Maximum exon size (default: 30000).
  --min-intron-size MIN_INTRON_SIZE
                        Minimum intron size (default: 10).
  --max-intron-size MAX_INTRON_SIZE
                        Maximum intron size (default: 400).
  --cds-start-min-score CDS_START_MIN_SCORE
                        Minimum model score for considering a CDS start (default: 0.01).
  --cds-end-min-score CDS_END_MIN_SCORE
                        Minimum model score for considering a CDS end (default: 0.01).
  --exon-start-min-score EXON_START_MIN_SCORE
                        Minimum model score for considering an exon start (default: 0.01).
  --exon-end-min-score EXON_END_MIN_SCORE
                        Minimum model score for considering an exon end (default: 0.01).
  --gene-candidates GENE_CANDIDATES
                        Maximum number of gene candidates to consider per locus (default: 5000).

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

jxu10 lisavader

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

1.1.0

Apr 3, 2026

This version

1.0.0

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geneml-1.0.0.tar.gz (2.8 MB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

geneml-1.0.0-py3-none-any.whl (2.7 MB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file geneml-1.0.0.tar.gz.

File metadata

Download URL: geneml-1.0.0.tar.gz
Upload date: Apr 2, 2026
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geneml-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`53a17eddddb65e53d0130a27d1ee05173b95bb299a9ed15c6b509ca9680080f3`
MD5	`1c1f79596976fb001453233b0b39202a`
BLAKE2b-256	`e37a333ea72e3f1b99dd79c5476b2cf98b8dab8709332ca8ce9d1ae8dcf98f72`

See more details on using hashes here.

Provenance

The following attestation bundles were made for geneml-1.0.0.tar.gz:

Publisher: ci.yml on hexagonbio/geneML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: geneml-1.0.0.tar.gz
- Subject digest: 53a17eddddb65e53d0130a27d1ee05173b95bb299a9ed15c6b509ca9680080f3
- Sigstore transparency entry: 1217092947
- Sigstore integration time: Apr 2, 2026
Source repository:
- Permalink: hexagonbio/geneML@c59fa66a53e90d090980365a075a3f411e613518
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/hexagonbio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@c59fa66a53e90d090980365a075a3f411e613518
- Trigger Event: push

File details

Details for the file geneml-1.0.0-py3-none-any.whl.

File metadata

Download URL: geneml-1.0.0-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 2.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geneml-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`361d6482419cfb3870658cee7a1cbc31ab7b9f9d4e13c56f381837eabe54bf29`
MD5	`ca2dfbe128a1d05e09fa8ee4ee1915a3`
BLAKE2b-256	`778b164c57701e0ac8fe6e1c2ff828faaacd408f4dc58a60dcb9007897337bb3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for geneml-1.0.0-py3-none-any.whl:

Publisher: ci.yml on hexagonbio/geneML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: geneml-1.0.0-py3-none-any.whl
- Subject digest: 361d6482419cfb3870658cee7a1cbc31ab7b9f9d4e13c56f381837eabe54bf29
- Sigstore transparency entry: 1217092986
- Sigstore integration time: Apr 2, 2026
Source repository:
- Permalink: hexagonbio/geneML@c59fa66a53e90d090980365a075a3f411e613518
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/hexagonbio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@c59fa66a53e90d090980365a075a3f411e613518
- Trigger Event: push

geneml 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

geneML

Installation

Using virtualenv

Using conda

Directly from the repo

Frequent options

Output

Fields

Identifiers

Scores

Full Usage

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance