StORF-Reporter - A tool that takes as input an annotated genome and returns missed CDS genes from the unannotated regions.

These details have not been verified by PyPI

Project links

Project description

2022.03.31.486628v1

StORF-Reporter, a toolkit that takes as input an annotated genome and returns missed CDS genes from the Unannotated Regions (URs).

Please use `pip3 install StORF-Reporter' to install the tool.

This will also install numpy and the 'ORForise' package from https://github.com/NickJD/ORForise to allow for additional functionality.

StORF-Reporter (BETA) - Please raise issues at https://github.com/NickJD/StORF-Reporter/issues

This tool extracts Unnannotated Regions from PROKKA genome annotations, finds Stop - Open Reading Frames and reports them in a new PROKKA formatted GFF file in the PROKKA output directory.

This tool is currently in BETA but can be run as:

python3 -m StORF-Reporter.StORF_Reporter -anno PROKKA -pd ../PROKKA_04062022/

Menu - (python3 -m StORF-Reporter.StORF_Reporter -h):

usage: StORF_Reporter.py [-h] -anno [{PROKKA,Ensembl,CDS}] [-pd PROKKA_DIR] [-min_len MINLEN]
[-max_len MAXLEN] [-ex_len EXLEN] [-olap OVERLAP_NT] [-type [{StORF,CDS,ORF}]] [-ao ALLOWED_OVERLAP] 
[-gz {True,False}] [-v {True,False}]

optional arguments:
  -h, --help            show this help message and exit
  -anno [{PROKKA,Ensembl,CDS}]
                        Default - PROKKA: Annotation type to be StORF-Reported:Options: PROKKA = 
                        "misc_RNA,gene,mRNA,CDS,tRNA,tmRNA,CRISPR";Ensembl = "ID=gene" ;CDS = "CDS"
  -pd PROKKA_DIR, --PROKKA_dir PROKKA_DIR
                        PROKKA output directory to be used if PROKKA chosen
  -min_len MINLEN       Default - 30: Minimum UR Length
  -max_len MAXLEN       Default - 100,000: Maximum UR Length
  -ex_len EXLEN         Default - 50: UR Extension Length
  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which can overlap another StORF.
  -type [{StORF,CDS,ORF}]
                        Default - "StORF": Which GFF type for StORFs to be reported as in GFF (StORF,CDS,ORF)
  -ao ALLOWED_OVERLAP   Default - 50 nt: Maximum overlap between a StORF and an original gene.
  -gz {True,False}      Default - False: Output as .gz
  -v {True,False}       Default - False: Print out runtime status

UR_Extractor

Python3 script to extract Unannotated Regions from DNA sequences uses FASTA and GFF files as input.

Menu - (python3 -m StORF-Reporter.UR_Extractor -h):

python3 -m StORF-Reporter.UR_Extractor -f genomes/E-coli.fasta.gz -gff genomes/E-coli.gff -o genomes/E-coli_UR -gz True

usage: UR_Extractor.py [-h] -f FASTA -gff GFF [-ident IDENT] [-min_len MINLEN]
                       [-max_len MAXLEN] [-ex_len EXLEN]
                       [-gene_ident GENE_IDENT] -o OUT_PREFIX
                       [-gz {True,False}]

optional arguments:
  -h, --help            show this help message and exit
  -f FASTA, --fasta_seq FASTA
                        FASTA file for Unannotated Region seq extraction
  -gff GFF              GFF annotation file for the FASTA
  -ident IDENT          Identifier given for Unannotated Region output
                        sequences: Default "Input"_UR
  -min_len MINLEN       Minimum UR Length: Default 30
  -max_len MAXLEN       Maximum UR Length: Default 100,000
  -ex_len EXLEN         UR Extension Length: Default 50
  -gene_ident GENE_IDENT
                        Identifier used for extraction of "genic" regions
                        "CDS,rRNA,tRNA": Default for Ensembl_Bacteria =
                        "ID=gene"
  -o OUT_PREFIX, --output_prefix OUT_PREFIX
                        Output file prefix - Without filetype
  -gz {True,False}      Default - False: Output as .gz

StORF-Finder

Python3 script to extract Stop - Stop Codon (St)ORFs from Fasta sequences.

Menu - (python3 -m StORF-Reporter.StORF_Finder -h):

python3 -m StORF-Reporter.StORF_Finder -seq genomes/E-coli_UR.fasta.gz -o genomes/E-coli_UR_StORF -gz True

usage: StORF_Finder.py [-h] -f FASTA [-ua {True,False}] [-wc {True,False}]
                       [-ps {True,False}] [-filt [{none,soft,hard}]]
                       [-aa {True,False}] [-con_storfs {True,False}]
                       [-aa_only {True,False}] [-con_only {True,False}]
                       [-stop_ident {True,False}] [-minorf MIN_ORF]
                       [-maxorf MAX_ORF] [-codons STOP_CODONS]
                       [-olap OVERLAP_NT] [-gff {True,False}] [-o OUT_PREFIX]
                       [-lw {True,False}] [-gz {True,False}] [-v {True,False}]

StORF Run Parameters.

optional arguments:
  -h, --help            show this help message and exit
  -f FASTA              Input FASTA File
  -ua {True,False}      Default - Treat input as Unannotated: Use "-ua False"
                        for standard fasta
  -wc {True,False}      Default - False: StORFs reported across entire
                        sequence
  -ps {True,False}      Default - False: Partial StORFs reported
  -filt [{none,soft,hard}]
                        Default - Hard: Filtering level none is not
                        recommended, soft for single strand filtering and hard
                        for both-strand longest-first tiling
  -aa {True,False}      Default - False: Report StORFs as amino acid sequences
  -con_storfs {True,False}
                        Default - False: Output Consecutive StORFs
  -aa_only {True,False}
                        Default - False: Only output Amino Acid Fasta
  -con_only {True,False}
                        Default - False: Only output Consecutive StORFs
  -stop_ident {True,False}
                        Default - True: Identify Stop Codon positions with '*'
  -minorf MIN_ORF       Default - 100: Minimum StORF size in nt
  -maxorf MAX_ORF       Default - 50kb: Maximum StORF size in nt
  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use
  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which
                        can overlap another StORF.
  -gff {True,False}     Default - True: StORF Output a GFF file
  -o OUT_PREFIX         Default - False/Same as input name with '_StORF-R':
                        Output filename prefix - Without filetype
  -lw {True,False}      Default - False: Line wrap FASTA sequence output at 60
                        chars
  -gz {True,False}      Default - False: Output as .gz
  -v {True,False}       Default - False: Print out runtime status

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.1

Sep 25, 2024

1.4.0

Sep 22, 2024

1.3.4

Feb 26, 2024

1.3.3

Feb 8, 2024

1.3.2

Feb 7, 2024

1.3.1

Jan 21, 2024

1.3.0

Dec 20, 2023

1.2.0

Dec 18, 2023

1.1.4

Dec 11, 2023

1.1.3

Dec 11, 2023

1.1.2

Dec 9, 2023

1.1.1

Nov 4, 2023

1.1.0

Aug 21, 2023

1.0.3

Jul 10, 2023

1.0.2

Jul 10, 2023

1.0.1

Jun 6, 2023

1.0.0

May 28, 2023

0.7.6

May 28, 2023

0.7.5

Apr 21, 2023

0.7.4

Mar 10, 2023

0.7.3

Feb 22, 2023

0.7.2

Feb 3, 2023

0.7.1

Jan 9, 2023

0.7.0

Jan 6, 2023

0.6.1

Nov 29, 2022

0.6.0

Nov 22, 2022

0.5.57

Oct 27, 2022

0.5.56

Oct 5, 2022

0.5.55

Sep 29, 2022

0.5.54

Sep 20, 2022

0.5.53

Sep 20, 2022

0.5.52

Sep 20, 2022

0.5.51

Sep 20, 2022

0.5.5

Sep 20, 2022

0.5.4

Sep 6, 2022

0.5.3

Aug 9, 2022

0.5.2

Jun 17, 2022

0.5.1

Jun 13, 2022

0.5.0

Jun 8, 2022

This version

0.4.2

Jun 2, 2022

0.4.1

May 11, 2022

0.4.0

May 11, 2022

0.3.1

May 6, 2022

0.3.0

May 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

StORF-Reporter-0.4.2.tar.gz (29.8 kB view hashes)

Uploaded Jun 2, 2022 Source

Built Distribution

StORF_Reporter-0.4.2-py3-none-any.whl (30.6 kB view hashes)

Uploaded Jun 2, 2022 Python 3

Hashes for StORF-Reporter-0.4.2.tar.gz

Hashes for StORF-Reporter-0.4.2.tar.gz
Algorithm	Hash digest
SHA256	`6c59479f57ee2de181dad05228d01a503332287b51516af834ddbc6ed6ed4fce`
MD5	`803af51a52a4888e50fcdea1edf2fcce`
BLAKE2b-256	`99fda6638a0ea017e21bc6e5a0dc7908e4f39d8607533b35cd2ca561f8aedc7b`

Hashes for StORF_Reporter-0.4.2-py3-none-any.whl

Hashes for StORF_Reporter-0.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2415b4a9b090486f6f808d66276823595c3f79ff8620bb0d0d5f7e9e527a0b7`
MD5	`23a73a91effb9b256d8bc59d60cc7e55`
BLAKE2b-256	`b09a1d1213f377b9f46a48251b9073c66c1826521e1ef6e89f94dec254eec5bb`