Skip to main content

Python programs for processing GFF3 files

Project description

GFF3toolkit - Python programs for processing GFF3 files

example workflow Build status PyPI version Documentation Status

Background

The GFF3 format (Generic Feature Format Version 3) is one of the standard formats to describe and represent genomic features. It is an incredibly flexible, 9-column format, which is easily manipulated by biologists. This flexibility, however, makes it very easy to break the format. We have developed the GFF3toolkit to help identify common problems with GFF3 files; fix 30 of these common problems; sort GFF3 files (which can aid in using down-stream processing programs and custom parsing); merge two GFF3 files into a single, non-redundant GFF3 file; and generate FASTA files from a GFF3 file for many use cases (e.g. feature types beyond mRNA).

Frequently Asked Questions/FAQ

Prerequisite

  • Python 3.x
    • wheel (should have been installed for most python distributions, if you don't have, use pip install wheel to install it.)
  • Perl

Installation

Stable release on PyPI

pip install gff3tool

Latest version

pip install git+https://github.com/NAL-i5K/GFF3toolkit.git

Current Functions

Usage

Detect GFF3 format errors (back)

Correct GFF3 format errors (back)

Merge two GFF3 files (back)

  • gff3_merge - Merge two GFF3 files
    • gff3_merge readme
    • gff3_merge full documentation
    • Quick start:
      • Merge the two file with auto-assignment of replace tags (default) gff3_merge -g1 example_file/new_models.gff3 -g2 example_file/reference.gff3 -f example_file/reference.fa -og merged.gff -r merged_report.txt
      • If your gff files have assigned proper replace tags at column 9 (Format: replace=[Transcript ID]), you could merge the two gff files without auto-assignment of tags. gff3_merge -g1 example_file/new_models_w_replace.gff3 -g2 example_file/reference.gff3 -f example_file/reference.fa -og merged.gff -r merged_report.txt -noAuto

Sort a GFF3 file (back)

  • gff3_sort - Sort a GFF3 file according to the order of Scaffold, coordinates on a Scaffold, and parent-child feature relationships
    • gff3_sort readme
    • Quick start: gff3_sort -g example_file/example.gff3 -og example-sorted.gff3

Generate biological sequences from a GFF3 file (back)

  • bin/gff3_to_fasta.py - extract biological sequences (such as spliced transcripts, cds, or peptides) from specific regions of genome based on a GFF3 file
    • gff3_to_fasta readme
    • Quick start: gff3_to_fasta -g example_file/example.gff3 -f example_file/reference.fa -st all -d simple -o test_sequences

Example Files (back)

Internal Dependencies (back)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gff3tool-2.1.0.tar.gz (83.3 kB view details)

Uploaded Source

File details

Details for the file gff3tool-2.1.0.tar.gz.

File metadata

  • Download URL: gff3tool-2.1.0.tar.gz
  • Upload date:
  • Size: 83.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for gff3tool-2.1.0.tar.gz
Algorithm Hash digest
SHA256 e842ef4afdb66926a2b63bf2619c7ef30bd9b76a5f1084a5c57bf8b39cc4f15d
MD5 7a869ad63f73237754e2c1d35a8994e1
BLAKE2b-256 7f12d177f42f6e3a73dae82fd5cc89e7404de121ab381e5454ca8da11110d88a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page