Skip to main content

Region-aware GFF annotation integration toolkit

Project description

gffkit

gffkit is a lightweight toolkit for region-aware GFF/GTF annotation integration. It combines three utilities:

  1. detect-bridge: detect suspicious merged-gene artifacts caused by bridge transcripts.
  2. complement: complement/merge annotations, with optional region-swap mode.
  3. add-utr: reconstruct five_prime_UTR and three_prime_UTR features from exon/CDS coordinates.

Installation

pip install gffkit

Quick start

Full integration pipeline

gffkit integrate \
  --annotation-a EviAnn.gff3 \
  --annotation-b ANNEVO.gff3 \
  --outdir gffkit_out \
  --prefix sample

Outputs:

  • gffkit_out/sample.suspicious.tsv
  • gffkit_out/sample.merged.gff3
  • gffkit_out/sample.final.withUTR.gff3

Step-by-step usage

# 1. Detect suspicious merged genes in Annotation A
gffkit detect-bridge -i EviAnn.gff3 -o suspicious.tsv

# 2. Use A as the global reference, but switch to B in suspicious regions
gffkit complement \
  --ref EviAnn.gff3 \
  --add ANNEVO.gff3 \
  --swap_region_tsv suspicious.tsv \
  --swap_region_flank 100 \
  --output merged.gff3

# 3. Add UTR features
gffkit add-utr -i merged.gff3 -o final.annotation.withUTR.gff3

Command overview

gffkit --help
gffkit detect-bridge --help
gffkit complement --help
gffkit add-utr --help
gffkit integrate --help

Annotation integration strategy

  • Annotation A, for example EviAnn/RNA-seq-supported GFF, is used as the global primary reference.
  • Annotation B, for example ANNEVO/deep-learning GFF, is used as the local primary reference only in suspicious merged-gene regions.
  • UTR features are reconstructed after merging using an exon-minus-CDS strategy.

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gffkit-0.1.0.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gffkit-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file gffkit-0.1.0.tar.gz.

File metadata

  • Download URL: gffkit-0.1.0.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for gffkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cff9d1b3be511e7fb01870ed52a3a877f6836f0952a4461934cf8b72ba7eeaef
MD5 913aa31bdd4233239f1466fe8e82fb80
BLAKE2b-256 ddc0a26f08fef0ce031d31c4a999da08899e29e4b28e1b480e2252fe651f717a

See more details on using hashes here.

File details

Details for the file gffkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gffkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for gffkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aabd8de41ae628f4b4ca3dbcd4d9ae41a4801c80f348f73bf138bfd6823d65e1
MD5 c9c06dfd7f293a7d91e8f33599644e85
BLAKE2b-256 9a08c93ed61f36641f842abe5ca943c93fbb4c6d72f43d73d2527ebe863b22a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page