Skip to main content

GFFtk: genome annotation GFF3 tool kit

Project description

Latest Github release Conda Code style: black Tests codecov

GFFtk: genome annotation tool kit

GFFtk is a comprehensive toolkit for working with genome annotation files in GFF3, GTF, and TBL formats. It provides powerful conversion, filtering, and manipulation capabilities for genomic data.

Features

  • Format Conversion: Convert between GFF3, GTF, TBL, and GenBank formats
  • Combined GFF3+FASTA: Support for combined files containing both annotations and sequences
  • Sequence Extraction: Extract protein and transcript sequences from annotations
  • Advanced Filtering: Filter annotations using flexible regex patterns
  • Consensus Models: Generate consensus gene models from multiple sources
  • Non-Standard Features: Support for intron, noncoding_exon, five_prime_UTR_intron, and pseudogenic_exon features
  • File Manipulation: Sort, sanitize, and rename features in annotation files

Installation

To install release versions use the pip package manager:

python -m pip install gfftk

To install the most updated code in master you can run:

python -m pip install git+https://github.com/nextgenusfs/gfftk.git

Quick Start

Basic Format Conversion

# Convert GFF3 to GTF
gfftk convert -i input.gff3 -f genome.fasta -o output.gtf

# Extract protein sequences
gfftk convert -i input.gff3 -f genome.fasta -o proteins.faa --output-format proteins

Combined GFF3+FASTA Format

# Create a combined file from separate GFF3 and FASTA files
gfftk convert -i input.gff3 -f genome.fasta -o combined.gff --output-format combined

# Read a combined file (no separate FASTA file needed)
gfftk convert -i combined.gff -o output.gff3 --output-format gff3

Advanced Filtering

# Keep only kinase genes
gfftk convert -i input.gff3 -f genome.fasta -o kinases.gff3 --grep product:kinase

# Remove augustus predictions
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 --grepv source:augustus

# Case-insensitive filtering with regex
gfftk convert -i input.gff3 -f genome.fasta -o results.gff3 --grep product:KINASE:i

# Combined filtering
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 \
    --grep product:kinase --grepv source:augustus

Filter Pattern Syntax

  • key:pattern - Basic string matching
  • key:pattern:i - Case-insensitive matching
  • key:regex - Regular expression patterns
  • Multiple --grep or --grepv flags for complex filtering

Common filter keys: product, source, name, note, contig, strand, type, db_xref, go_terms

For more examples and detailed documentation, see the tutorial.

Development

Code Formatting

This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).

To set up pre-commit:

  1. Install pre-commit:
pip install pre-commit
  1. Install the git hooks:
pre-commit install
  1. (Optional) Run against all files:
pre-commit run --all-files

After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gfftk-26.2.12.tar.gz (4.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gfftk-26.2.12-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file gfftk-26.2.12.tar.gz.

File metadata

  • Download URL: gfftk-26.2.12.tar.gz
  • Upload date:
  • Size: 4.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gfftk-26.2.12.tar.gz
Algorithm Hash digest
SHA256 7dfc3a215cee322282f7ea271ffbcd859bd493ff3494232f03335e64dec6fa61
MD5 b4eb1812632296c0b6e706258b7bedb3
BLAKE2b-256 9efb1867924b2b3c7d5502a645672b7dfe0e23a51875048b78e9dd4a48a66613

See more details on using hashes here.

Provenance

The following attestation bundles were made for gfftk-26.2.12.tar.gz:

Publisher: production-release.yml on nextgenusfs/gfftk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gfftk-26.2.12-py3-none-any.whl.

File metadata

  • Download URL: gfftk-26.2.12-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gfftk-26.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 5486def017a1ce67bd43378252bf174dd1f40e5ee3d01dcdf0101c814271b5f9
MD5 57afccf5b6d7e18d53e837fb325c8c5b
BLAKE2b-256 a6f6bfefd690a4d6dbf19932bb271bdf30ba57174cc104419a12dde448f950c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for gfftk-26.2.12-py3-none-any.whl:

Publisher: production-release.yml on nextgenusfs/gfftk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page