StORF-Reporter - A tool that takes as input an annotated genome and returns missed CDS genes from the unannotated regions.
Project description
StORF-Reporter - Preprint: https://www.biorxiv.org/content/10.1101/2022.03.31.486628v1
StORF-Reporter, a tool that takes as input an annotated genome and returns missed CDS genes from the unannotated regions.
Please use `pip3 install StORF-Reporter' to install the tool.
This will install 'ORForise' from https://github.com/NickJD/ORForise to allow for additional functionality.
StORF-Reporter.py
This script extracts Unnannotated Regions from PROKKA genome annotations, find Stop - Open Reading Frames and reports them in a new PROKKA formatted GFF file in the PROKKA output directory.
This tool is currently in BETA but can be run as:
python3 -m StORF-Reporter.StORF_Reporter -anno PROKKA -pd ../PROKKA_04062022/
UR_Extractor.py
Python3 script to extract Unannotated Regions from DNA sequences uses FASTA and GFF files as input.
For Help: python3 UR_Extractor.py -h
Example: python3 UR_Extractor.py -f genomes/E-coli.fasta.gz -gff genomes/E-coli.gff -o genomes/E-coli_UR -gz True
usage: UR_Extractor.py [-h] -f FASTA -gff GFF [-ident IDENT] [-min_len MINLEN]
[-max_len MAXLEN] [-ex_len EXLEN]
[-gene_ident GENE_IDENT] -o OUT_PREFIX
[-gz {True,False}]
optional arguments:
-h, --help show this help message and exit
-f FASTA, --fasta_seq FASTA
FASTA file for Unannotated Region seq extraction
-gff GFF GFF annotation file for the FASTA
-ident IDENT Identifier given for Unannotated Region output
sequences: Default "Input"_UR
-min_len MINLEN Minimum UR Length: Default 30
-max_len MAXLEN Maximum UR Length: Default 100,000
-ex_len EXLEN UR Extension Length: Default 50
-gene_ident GENE_IDENT
Identifier used for extraction of "genic" regions
"CDS,rRNA,tRNA": Default for Ensembl_Bacteria =
"ID=gene"
-o OUT_PREFIX, --output_prefix OUT_PREFIX
Output file prefix - Without filetype
-gz {True,False} Default - False: Output as .gz
StORF-Finder.py
Python3 script to extract Stop - Stop Codon (St)ORFs from Fasta sequences.
For Help: python3 StORF_Finder.py -h
Example: python3 StORF_Finder.py -seq genomes/E-coli_UR.fasta.gz -o genomes/E-coli_UR_StORF -gz True
usage: StORF_Finder.py [-h] -f FASTA [-ua {True,False}] [-wc {True,False}]
[-ps {True,False}] [-filt [{none,soft,hard}]]
[-aa {True,False}] [-con_storfs {True,False}]
[-aa_only {True,False}] [-con_only {True,False}]
[-stop_ident {True,False}] [-minorf MIN_ORF]
[-maxorf MAX_ORF] [-codons STOP_CODONS]
[-olap OVERLAP_NT] [-gff {True,False}] [-o OUT_PREFIX]
[-lw {True,False}] [-gz {True,False}] [-v {True,False}]
StORF Run Parameters.
optional arguments:
-h, --help show this help message and exit
-f FASTA Input FASTA File
-ua {True,False} Default - Treat input as Unannotated: Use "-ua False"
for standard fasta
-wc {True,False} Default - False: StORFs reported across entire
sequence
-ps {True,False} Default - False: Partial StORFs reported
-filt [{none,soft,hard}]
Default - Hard: Filtering level none is not
recommended, soft for single strand filtering and hard
for both-strand longest-first tiling
-aa {True,False} Default - False: Report StORFs as amino acid sequences
-con_storfs {True,False}
Default - False: Output Consecutive StORFs
-aa_only {True,False}
Default - False: Only output Amino Acid Fasta
-con_only {True,False}
Default - False: Only output Consecutive StORFs
-stop_ident {True,False}
Default - True: Identify Stop Codon positions with '*'
-minorf MIN_ORF Default - 100: Minimum StORF size in nt
-maxorf MAX_ORF Default - 50kb: Maximum StORF size in nt
-codons STOP_CODONS Default - ('TAG,TGA,TAA'): List Stop Codons to use
-olap OVERLAP_NT Default - 50: Maximum number of nt of a StORF which
can overlap another StORF.
-gff {True,False} Default - True: StORF Output a GFF file
-o OUT_PREFIX Default - False/Same as input name with '_StORF-R':
Output filename prefix - Without filetype
-lw {True,False} Default - False: Line wrap FASTA sequence output at 60
chars
-gz {True,False} Default - False: Output as .gz
-v {True,False} Default - False: Print out runtime status
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for StORF_Reporter-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 996895e5841310ec3ecfaf11d3495b3349876cd5f244aa8f0d1cec58a58ddeda |
|
MD5 | 2bf13c3bddd06dafc6d7fb561c65a3bb |
|
BLAKE2b-256 | de161688503139608f2cf7ef883152b29db62793fa33950f959ede34fca126d9 |