Skip to main content

Genome annotation enhancement pipeline

Project description

ReAnnota

Introduction

ReAnnota is a Python pipeline initially developed for enhancing cyanobacteria genome annotations by merging results from multiple annotation tools (Bakta, EggNOG, InterPro, antiSMASH, Pseudogenefinder) into a final Gff file.

Usage

Setup

To run ReAnnota ,the required dependencies must first be downloaded. We reccomend using uv:

uv .venv
source .venv/bin/activate
uv pip install e.

Input files

Tool / Input File Description
Bakta GBFF file (.gbff) generated by annotation tool (required)
Output Output GBFF file path (required)
EggNOG Annotation file in TSV format (.tsv)
InterProScan Annotation file in GFF3 format (.gff3)
antiSMASH regions file in JS format (.js)
Pseudofinder Pseudofinder file in GFF format (.gff)
Gecco csv containing all Gecco produced cluster files in GenBank format (.gbk)
GFF comparison Gff file (.gff) produced by annotation tool

Running ReAnnota

Running ReAnnota with the --help option will display th help message:

 Genome annotation enhancement pipeline                           
                                                                                
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --version  -v        Print the current tool version and exit.                │
│ --help     -h        Show this message and exit.                             │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ annotate   Run the genome annotation enhancement pipeline.                   │
╰──────────────────────────────────────────────────────────────────────────────╯

Running the command annotate with the --help option will display th help message:

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ *  --gbff-input         -gbi      FILE                  Input GBFF file      │
│                                                         generated by Bakta.  │
│                                                         [required]           │
│ *  --output             -o        FILE                  Output GBFF file     │
│                                                         path.                │
│                                                         [required]           │
│    --egg-input          -ei       FILE                  Input EggNOG         │
│                                                         annotation file      │
│                                                         (.tsv).              │
│    --ipr-input          -ii       FILE                  Input InterPro       │
│                                                         annotation file      │
│                                                         (.gff3).             │
│    --antismash-input    -ai       FILE                  Input antiSMASH      │
│                                                         annotation file      │
│                                                         (.json).             │
│    --pseudofinder-inp…  -pi       FILE                  Input Pseudofinder   │
│                                                         annotation file      │
│                                                         (.gff).              │
│    --gecco-input        -pi       FILE                  Input Gecco          │
│                                                         annotation file      │
│                                                         (.gbk).              │
│    --gff-input          -gfi      FILE                  Input GFF file for   │
│                                                         comparison.          │
│    --log-file           -l        FILE                  Optional log file    │
│                                                         path. If not         │
│                                                         provided, will be    │
│                                                         created in output    │
│                                                         directory.           │
│    --log-level                    [debug|info|warning|  Set the desired log  │
│                                   error|critical]       level.               │
│                                                         [default: INFO]      │
│    --circos                                             Generate Circos      │
│                                                         visualization.       │
│    --compare                                            Generate gff         │
│                                                         comparison file      │
│    --help               -h                              Show this message    │
│                                                         and exit.            │
╰──────────────────────────────────────────────────────────────────────────────╯

Now you can run the pipeline using :

reannota annotate \
    --egg-input eggnog_input \
    --ipr-input interpro_input \
    --gbff-input input_annotation.gbff \
    --gff-input input_annotation.gff \
    --output  <OUTDIR>\
    --antismash-input antismash_input \
    --pseudofinder-input pseudofinder_input \
    --gecco-input gecco_input \
    --compare \
    --circos \

Outputs

The output folder structure will look as follows:

└─<OUTDIR>
   ├─results
   │  |─Enhanced.gff3
   |  |─Enhanced.gbff
   ├─tool_hits
   │  |─Interpro_hits.gff3
   │  |─eggNOG_hits.gff3
   ├─bgcs
   │  |─combined_gecco_clusters.gbk
   |─compare (optional)
   |  |─gff_comparison.csv
   |─visualisation (optional)
   |  |─Circos_plot_starter.png
   |  |─Circos_plot_enhanced.png
   |─pipeline.log

Merged files

The two main output files for each genome are located in <OUTDIR>/results/:

  • enhanced.gbff: annotation file produced after the integration of the tool outputs
  • enhanced.gbff: the gff3 version of the produced .gbff file

Comparison

If the --compare option is added, ReAnnota will produce a .csv comparison file inside the Compare folder, which includes the following (GO_entries, InterPro_entries, PFAM_entries, KEGG_entries, Pseudogene_candidates, BGCs, Hypotheticals)

Visualisation

If the --circos option is added, ReAnnota will produce two circos plot in the Visualisation folder, one for the starting annotation file that is used as input and one fot the enhanced.gbff

Workflow

Product name determination

The following logic is use in ReAnnota to fill the product field in the 9th column of the Gff file along with the integration of db_entries and notes:

Integration of Pseudogenes

ReAnnota currently supports Peudogenefinder's .gff ouput.

  • The initial annotations file's pseudogene labels will be preserved
  • Additional pseudogene entries will be integrated in one of two ways:
    1. If product is "hypothetical protein" the label "Pseudogene" will be added along with any attributes from Pseudogenefinder's gff file in the Note section
    2. If product is not "hypothetical protein" , only the attributes are added in the Note section

Integration of antismash and GECCO BGCs

If antiSMASH or GECCO output files are provided, ReAnnota will also include in the end of the enhanced.gff3 file a section starting with "##Antismash.." or "##GECCO.." respectively with all BGCs in gff3 format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reannota-0.1.2.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reannota-0.1.2-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file reannota-0.1.2.tar.gz.

File metadata

  • Download URL: reannota-0.1.2.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for reannota-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8cb7db2c01a269c43bbfa51387dbfd30158b447074dc4c42f078968ee364f3a4
MD5 964462c4871016464cb3f5b04ce39c55
BLAKE2b-256 e41ed94981417c4197347214577d0867019fcbdfbb00768b26d9fcf1bbe4842b

See more details on using hashes here.

File details

Details for the file reannota-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: reannota-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for reannota-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7c7744581bec96b629d2fd5423f94030f73377a84fff9d6a8dbdf0012b57745c
MD5 e64928748605efe8da5229d3d248f21e
BLAKE2b-256 a985a14fa848873cb140d7af9063330c20e9011754d3cb9f9d098f52cf109d8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page