StORF-Reporter - A a tool that takes an annotated genome and returns missing CDS genes (Stop-to-Stop) from unannotated regions.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

gkad814

StORF-Reporter, a toolkit that returns missed CDS genes from the Unannotated Regions (URs) of prokaryotic genomes.

Please use `pip install StORF-Reporter' to install StORF-Reporter.

This will also install the python-standard library numpy (>=1.22.0,<1.24.0) and Pyrodigal (https://github.com/althonos/pyrodigal).

Consider using '--no-cache-dir' with pip to ensure the download of the newest version of StORF-Reporter.

The directory "Test_Datasets" is provided to confirm functionality of StORF-Reporter.

#############################################################

StORF-Reporter:

Most common use cases -

Supplement a current annotation from a tool such as Prokka or Bakta. A new GFF file will be created compatible with downstream pangenome analysis tools such as Roary and Panaroo.

For use on a single Prokka/Bakta output directory - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.

StORF-Reporter -anno Prokka Out_Dir -p .../Test_Datasets/Prokka_E-coli/

For use on multiple Prokka/Bakta output directies - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.

StORF-Reporter -anno Prokka Multiple_Out_Dirs -p ../Test_Datasets/Multi_Prokka_Outs

For use on a directory containing multiple Prokka/Bakta output gffs - Only produces new GFF files.

StORF-Reporter -anno Prokka Multiple_GFFs -p .../Test_Datasets/Prokka_Outputs/

For use on a GFF file from a CDS prediction tool such as Prodigal - Provide a GFF file and StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name).

StORF-Reporter -anno Feature_Types Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/Myco.gff

For use on a directory containing multiple GFF files from a CDS prediction tool such as Prodigal - StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name).

StORF-Reporter -anno Feature_Types Multiple_Genomes -p .../Test_Datasets/Matching_GFF_FASTA/

For use on a directory containing multiple GFF files with embedded FASTA.

StORF-Reporter -anno Feature_Types Multiple_Combined_GFFs -p .../Test_Datasets/Combined_GFFs/

To perform a fresh end-to-end annotation of a genome without an annotation, StORF-Reporter will use Pyrodigal to predict CDS genes and then supplement with StORFs.

StORF-Reporter -anno Pyrodigal Single_FASTA -p .../Test_Datasets/Pyrodigal/E-coli.fa

Menu - (StORF-Reporter -h):

StORF-Reporter -anno Ensembl Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff

StORF-Reporter v1.4.7: StORF-Reporter Run Parameters.

Required Options:
  -anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]
                        Select Annotation and Input options for one of the 3 options listed below
                        ### Prokka/Bakta Annotation Option 1: 
                        	Prokka = Report StORFs for a Prokka annotation; 
                        	Bakta = Report StORFs for a Bakta annotation; 
                        --- Prokka/Bakta Input Options: 
                        	Out_Dir = To provide the output directory of either a Prokka or Bakta run (will produce a new GFF and FASTA file containing original and extended annotations); 
                        	Multiple_Out_Dirs = To provide a directory containing multiple Prokka/Bakta standard output directories - Will run on each sequentially; 
                        	Single_GFF = To provide a single Prokka or Bakta GFF - searches for accompanying ".fna" file (will provide a new extended GFF); 
                        	Multiple_GFFs = To provide a directory containing multiple Prokka or Bakta GFF files - searches for accompanying ".fna" files (will provide a new extended GFF); 
                        
                        ### Standard GFF Annotation Option 2: 
                        	Ensembl = Report StORFs for an Ensembl Bacteria annotation (ID=gene); 
                        	Feature_Types = Used in conjunction with -gene_ident to define features such as CDS,rRNA,tRNA for UR extraction (default CDS); 
                        --- Standard GFF Input Options: 
                        	Single_Genome = To provide a single Genome - accompanying FASTA must share same name as given gff file (can be .fna, .fa or .fasta); 
                        	Multiple_Genomes = To provide a directory containing multiple accompanying GFF and FASTA files - files must share the same name (fasta can be .fna, .fa or .fasta); 
                        	Single_Combined_GFF = To provide a GFF file with embedded FASTA at the bottom; 
                        	Multiple_Combined_GFFs = To provide a directory containing multiple GFF files with embedded FASTA at the bottom; 
                        
                        ### Complete Annotation Option 3: 
                        	Pyrodigal = Run Pyrodigal then Report StORFs (provide path to single FASTA or directory of multiple FASTA files ;
                        --- Complete Annotation Input Options: 
                        	Single_FASTA = To provide a single FASTA file; 
                        	Multiple_FASTA = To provide a directory containing multiple FASTA files (will detect .fna,.fa,.fasta); 
                        
  -p PATH               Provide input file or directory path

StORF-Reporter Options:
  -af ALT_FILENAME      Default - Prokka/Bakta output directory share the same prefix with their gff/fna files - Use this option when Prokka/Bakta output
                        directory name is different from the gff/fna files within and StORF-Reporter will search for the gff/fna with the given prefix
                        (MyProkkaDir/"altname".gff) - Does not work with "Multiple_Out_Dirs" option
  -oname O_NAME         Default - Appends '_StORF-Reporter_Extended' to end of input filename - Takes the directory name of Prokka/Bakta output if given as
                        input or the input for -af if given - Multiple_* runs will be numbered
  -odir O_DIR           Default - Same directory as input
  -sout {True,False}    Default - False: Print out StORF sequences separately from Prokka/Bakta annotations
  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars
  -aa                   Default - False: Report StORFs as amino acid sequences
  -gz {True,False}      Default - False: Output as .gz

Pyrodigal Options:
  -py_train [{longest,individual,meta}]
                        Default - longest: Type of model training to be done for Pyrodigal CDS prediction: Options: longest = Trains on longest contig;
                        individual = Trains on each contig separately - runs in meta mode if contig is < 20KB; meta = Runs in meta mode for all sequences
  -py_fasta {True,False}
                        Default - False: Output Pyrodigal+StORF predictions in FASTA format
  -py_unstorfed {True,False}
                        Default - False: Provide GFF containing original Pyrodigal predictions

UR-Extractor Options:
  -gene_ident GENE_IDENT
                        Default: "CDS". Specifies feature types to exclude from Unannotated rRegion extraction. Provide a comma-separated list of feature
                        types, e.g., "misc_RNA,gene,mRNA,CDS,rRNA,tRNA,tmRNA,CRISPR,ncRNA,regulatory_region,oriC,pseudo", to identify annotated regions. -
                        To be used with "-anno Feature_Types" - "-gene_ident Prokka" will select "most" features present in Prokka/Bakta annotations-
                        Providing "ID=gene" will check the attribute column for features assigned as genes (compatible with Ensembl annotations). All
                        regions without these feature types will be extracted as unannotated.
  -min_len MINLEN       Default - 30: Minimum UR Length
  -max_len MAXLEN       Default - 100,000: Maximum UR Length
  -ex_len EXLEN         Default - 50: UR Extension Length

StORF-Finder Options:
  -spos {True,False}    Default - False: Output StORF sequences and GFF positions inclusive of first stop codon -This can break some downstream tools if
                        changed to True.
  -rs {True,False}      Default - True: Remove stop "*" from StORF amino acid sequences
  -con_storfs {True,False}
                        Default - False: Output Consecutive StORFs
  -con_only {True,False}
                        Default - False: Only output Consecutive StORFs
  -ps {True,False}      Default - False: Partial StORFs reported
  -wc {True,False}      Default - False: StORFs reported across entire sequence
  -short_storfs {False,Nolap,Olap}
                        Default - False: Run StORF-Finder in "Short-StORF" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer
                        StORFs - Only works with StORFs for now. "Nolap" will filter Short-StORFs which areoverlapped by StORFs and Olap will report Short-
                        StORFs which do overlap StORFs. Overlap is defined by "-olap".
  -short_storfs_only {True,False}
                        Default - True. Only report Short-StORFs?
  -minorf MIN_ORF       Default - 99: Minimum StORF size in nt
  -maxorf MAX_ORF       Default - 60kb: Maximum StORF size in nt
  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use
  -olap_filt [{none,single-strand,both-strand}]
                        Default - "both-strand": Filtering level "none" is not recommended, "single-strand" for single strand filtering and both-strand for
                        both-strand longest-first tiling
  -start_filt {True,False}
                        Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).
  -so [{start_pos,strand}]
                        Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.
  -f_type [{StORF,CDS,ORF}]
                        Default - "CDS": Which GFF feature type for StORFs to be reported as in GFF - "CDS" is probably needed for use in tools such as
                        Roary and Panaroo
  -non_standard NON_STANDARD
                        Default - 0.20: Reject StORFs with >=20% non-standard nucleotides (A,T,G,C) - Provide % as decimal
  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which can overlap another StORF.
  -ao ALLOWED_OVERLAP   Default - 50 nt: Maximum overlap between a StORF and an original gene.

Misc:
  -overwrite, --overwrite
                        Default - False: Overwrite StORF-Reporter output if already present
  -verbose, --verbose   Default - False: Print out runtime messages
  -v, --version         Print out version number and exit

###################################

UR-Extractor:

Subpackage to extract Unannotated Regions from DNA sequences using FASTA and GFF files as input.

Menu - (UR-Extractor -h):

UR-Extractor -f .../Test_Datasets/Matching_GFF_FASTA/E-coli.fa -gff .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff

usage: UR_Extractor.py [-h] -gff GFF [-f FASTA] [-ident IDENT]
                       [-min_len MINLEN] [-max_len MAXLEN] [-ex_len EXLEN]
                       [-gene_ident GENE_IDENT] [-oname O_NAME] [-odir O_DIR]
                       [-gz {True,False}] [-verbose {True,False}] [-v]

StORF-Reporter v1.4.7: UR-Extractor Run Parameters.

Required Arguments:
  -gff GFF              GFF file containing genome annotation

Optional Arguments:
  -f FASTA              Accompanying FASTA file if GFF file does not contain
                        sequence data
  -ident IDENT          Identifier given for Unannotated Region output
                        sequences - Do not modify if output is to be used by
                        StORF-Finder: Default "Sequence-ID"_UR
  -min_len MINLEN       Minimum UR Length: Default 30
  -max_len MAXLEN       Maximum UR Length: Default 100,000
  -ex_len EXLEN         UR Extension Length on 5' and 3': Default 50
  -gene_ident GENE_IDENT
                        Default: "CDS". Specifies feature types to exclude
                        from Unannotated rRegion extraction. Provide a comma-
                        separated list of feature types, e.g., "misc_RNA,gene,
                        mRNA,CDS,rRNA,tRNA,tmRNA,CRISPR,ncRNA,regulatory_regio
                        n,oriC,pseudo", to identify annotated regions.
                        "-gene_ident Prokka" will select "most" features
                        present in Prokka/Bakta annotations- Providing
                        "ID=gene" will check the attribute column for features
                        assigned as genes (compatible with Ensembl
                        annotations). All regions without these feature types
                        will be extracted as unannotated.

Output:
  -oname O_NAME         Default - Appends '_UR' to end of input GFF filename
  -odir O_DIR           Default - Same directory as input GFF
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit

StORF-Finder:

Subpackage to extract StORFs from Fasta sequences - Works directly with the output of UR-Extractor.

Menu - (StORF-Finder -h):

StORF-Finder -f .../Test_Datasets/Matching_GFF_FASTA/E-coli_UR.fa

usage: StORF_Finder.py [-h] -f FASTA [-ua {True,False}] [-wc {True,False}]
                       [-ps {True,False}]
                       [-olap_filt [{none,single-strand,both-strand}]]
                       [-start_filt {True,False}] [-con_storfs {True,False}]
                       [-con_only {True,False}]
                       [-short_storfs {False,Nolap,Olap}]
                       [-short_storfs_only {True,False}]
                       [-f_type [{StORF,CDS,ORF}]] [-minorf MIN_ORF]
                       [-maxorf MAX_ORF] [-codons STOP_CODONS]
                       [-non_standard NON_STANDARD] [-olap OVERLAP_NT]
                       [-s SUFFIX] [-so [{start_pos,strand}]] [-oname O_NAME]
                       [-odir O_DIR] [-gff {True,False}] [-aa {True,False}]
                       [-aa_only {True,False}] [-lw {True,False}]
                       [-spos {True,False}] [-stop_ident {True,False}]
                       [-gff_fasta {True,False}] [-gz {True,False}]
                       [-verbose {True,False}] [-v]

StORF-Reporter v1.4.7: StORF-Finder Run Parameters.

Required Arguments:
  -f FASTA              Input FASTA File - (UR_Extractor output)

Optional Arguments:
  -ua {True,False}      Default - Treat input as Unannotated: Use "-ua False"
                        for standard fasta
  -wc {True,False}      Default - False: StORFs reported across entire
                        sequence
  -ps {True,False}      Default - False: Partial StORFs reported
  -olap_filt [{none,single-strand,both-strand}]
                        Default - "both-strand": Filtering level "none" is not
                        recommended, "single-strand" for single strand
                        filtering and both-strand for both-strand longest-
                        first tiling
  -start_filt {True,False}
                        Default - False: Filter out StORFs without at least
                        one of the 3 common start codons (best used for short-
                        storfs).
  -con_storfs {True,False}
                        Default - False: Output Consecutive StORFs
  -con_only {True,False}
                        Default - False: Only output Consecutive StORFs
  -short_storfs {False,Nolap,Olap}
                        Default - False: Run StORF-Finder in "Short-StORF"
                        mode. Will only return StORFs between 30 and 120 nt
                        that do not overlap longer StORFs - Only works with
                        StORFs for now. "Nolap" will filter Short-StORFs which
                        areoverlapped by StORFs and Olap will report Short-
                        StORFs which do overlap StORFs. Overlap is defined by
                        "-olap".
  -short_storfs_only {True,False}
                        Default - True. Only report Short-StORFs?
  -f_type [{StORF,CDS,ORF}]
                        Default - "StORF": Which GFF feature type for StORFs
                        to be reported as in GFF
  -minorf MIN_ORF       Default - 99: Minimum StORF size in nt
  -maxorf MAX_ORF       Default - 60kb: Maximum StORF size in nt
  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use
  -non_standard NON_STANDARD
                        Default - 0.20: Reject StORFs with >=20% non-standard
                        nucleotides (A,T,G,C) - Provide % as decimal
  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which
                        can overlap another StORF.
  -s SUFFIX             Default - Do not append suffix to genome ID
  -so [{start_pos,strand}]
                        Default - Start Position: How should StORFs be ordered
                        when >1 reported in a single UR.

Output:
  -oname O_NAME         Default - Appends '_StORF-Finder' to end of input
                        FASTA filename
  -odir O_DIR           Default - Same directory as input FASTA
  -gff {True,False}     Default - True: Output a GFF file
  -aa {True,False}      Default - False: Report StORFs as amino acid sequences
  -aa_only {True,False}
                        Default - False: Only output Amino Acid Fasta
  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60
                        chars
  -spos {True,False}    Default - False: Output StORF sequences and GFF
                        positions inclusive of first stop codon -This can
                        break some downstream tools if changed to True.
  -stop_ident {True,False}
                        Default - True: Identify Stop Codon positions with '*'
  -gff_fasta {True,False}
                        Default - False: Report all gene sequences (nt) at the
                        bottom of GFF files in Prokka output mode
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit

StORF-Extractor

Subpackage to extract sequences reported by StORF-Reporter from a genome annotation.

Menu - (StORF-Extractor -h):

StORF-Extractor -storf_input Combined -p .../Test_Datasets/Combined_GFFs/E-coli_Combined_StORF-Reporter_Extended.gff

usage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}] [-verbose {True,False}] [-v]

StORF-Reporter v1.4.7: StORF-Extractor Run Parameters.

Required Arguments:
  -storf_input {Combined,Separate}
                        Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?
  -p PATH               Provide input file or directory path

Output:
  -gff_out {True,False}
                        Default - False: Output StORFs in GFF format
  -oname O_NAME         Default - Appends '_Extracted_StORFs' to end of input GFF filename
  -odir O_DIR           Default - Same directory as input FASTA
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit

StORF-Remover

Subpackage to remove sequences reported by StORF-Reporter without a Blast/Diamond hit (any alignment in BLAST 6 format).

Menu - (StORF-Remover -h):

StORF-Remover -gff .../Test_Datasets/StORF_Extractor_And_Remover/Myco_UR_StORF-R.gff -blast .../Test_Datasets/StORF_Extractor_And_Remover/Myco_URs_StORFs_aa_Swiss.tab

usage: StORF_Remover.py [-h] [-gff GFF] [-blast BLAST] [-min_score MINSCORE] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]
                        [-verbose {True,False}] [-v]

StORF-Reporter v1.4.1: UR-Remover Run Parameters.

Required Arguments:
  -gff GFF              GFF annotation file for the FASTA
  -blast BLAST          BLAST format 6 annotation file

Optional Arguments:
  -min_score MINSCORE   Minimum BitScore to keep StORF: Default 30

Output:
  -oname O_NAME         Default - Appends '_UR' to end of input GFF filename
  -odir O_DIR           Default - Same directory as input GFF
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit

Test Datasets:

The directory 'Test_Datasets' contains GFF and FASTA files to test the installation and use of StORF-Reporter - Example output files are also provided for comparison.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.4.7

Jan 15, 2026

1.4.6

Jan 12, 2026

1.4.5

Jan 12, 2026

1.4.4

Mar 13, 2025

1.4.3

Nov 14, 2024

1.4.2

Oct 23, 2024

1.4.1

Sep 25, 2024

1.4.0

Sep 22, 2024

1.3.4

Feb 26, 2024

1.3.3

Feb 8, 2024

1.3.2

Feb 7, 2024

1.3.1

Jan 21, 2024

1.3.0

Dec 20, 2023

1.2.0

Dec 18, 2023

1.1.4

Dec 11, 2023

1.1.3

Dec 11, 2023

1.1.2

Dec 9, 2023

1.1.1

Nov 4, 2023

1.1.0

Aug 21, 2023

1.0.3

Jul 10, 2023

1.0.2

Jul 10, 2023

1.0.1

Jun 6, 2023

1.0.0

May 28, 2023

0.7.6

May 28, 2023

0.7.5

Apr 21, 2023

0.7.4

Mar 10, 2023

0.7.3

Feb 22, 2023

0.7.2

Feb 3, 2023

0.7.1

Jan 9, 2023

0.7.0

Jan 6, 2023

0.6.1

Nov 29, 2022

0.6.0

Nov 22, 2022

0.5.57

Oct 27, 2022

0.5.56

Oct 5, 2022

0.5.55

Sep 29, 2022

0.5.54

Sep 20, 2022

0.5.53

Sep 20, 2022

0.5.52

Sep 20, 2022

0.5.51

Sep 20, 2022

0.5.5

Sep 20, 2022

0.5.4

Sep 6, 2022

0.5.3

Aug 9, 2022

0.5.2

Jun 17, 2022

0.5.1

Jun 13, 2022

0.5.0

Jun 8, 2022

0.4.2

Jun 2, 2022

0.4.1

May 11, 2022

0.4.0

May 11, 2022

0.3.1

May 6, 2022

0.3.0

May 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

storf_reporter-1.4.7.tar.gz (57.0 kB view details)

Uploaded Jan 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

storf_reporter-1.4.7-py3-none-any.whl (58.4 kB view details)

Uploaded Jan 15, 2026 Python 3

File details

Details for the file storf_reporter-1.4.7.tar.gz.

File metadata

Download URL: storf_reporter-1.4.7.tar.gz
Upload date: Jan 15, 2026
Size: 57.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for storf_reporter-1.4.7.tar.gz
Algorithm	Hash digest
SHA256	`3bbfc801c4386681381b70396870d6862e5cc23abb3bf8eddad27bde6e723cd8`
MD5	`73bb4f2eb7cd215749feb21197861848`
BLAKE2b-256	`bcc4905cc00b13b1e98fa92402712e9383416da1635be955d8fd76771df5f25a`

See more details on using hashes here.

File details

Details for the file storf_reporter-1.4.7-py3-none-any.whl.

File metadata

Download URL: storf_reporter-1.4.7-py3-none-any.whl
Upload date: Jan 15, 2026
Size: 58.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for storf_reporter-1.4.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b6846b897f0a8b2cace7ac83526be4ad9e420d2ae8d16c2f0f632daf9ce56d2`
MD5	`ac0476dcefb9a6d4248bcad047f8bebf`
BLAKE2b-256	`784806d4190ae00c170cd58530ff2161b1a8d66a992207979013e9360ce15f04`

See more details on using hashes here.

StORF-Reporter 1.4.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

StORF-Reporter has now been published in NAR: https://doi.org/10.1093/nar/gkad814

StORF-Reporter, a toolkit that returns missed CDS genes from the Unannotated Regions (URs) of prokaryotic genomes.

Please use `pip install StORF-Reporter' to install StORF-Reporter.

This will also install the python-standard library numpy (>=1.22.0,<1.24.0) and Pyrodigal (https://github.com/althonos/pyrodigal).

Consider using '--no-cache-dir' with pip to ensure the download of the newest version of StORF-Reporter.

The directory "Test_Datasets" is provided to confirm functionality of StORF-Reporter.

StORF-Reporter:

Most common use cases -

Supplement a current annotation from a tool such as Prokka or Bakta. A new GFF file will be created compatible with downstream pangenome analysis tools such as Roary and Panaroo.

For use on a single Prokka/Bakta output directory - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.

For use on multiple Prokka/Bakta output directies - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.

For use on a directory containing multiple Prokka/Bakta output gffs - Only produces new GFF files.

For use on a GFF file from a CDS prediction tool such as Prodigal - Provide a GFF file and StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name).

For use on a directory containing multiple GFF files from a CDS prediction tool such as Prodigal - StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name).

For use on a directory containing multiple GFF files with embedded FASTA.

To perform a fresh end-to-end annotation of a genome without an annotation, StORF-Reporter will use Pyrodigal to predict CDS genes and then supplement with StORFs.

Menu - (StORF-Reporter -h):

UR-Extractor:

Subpackage to extract Unannotated Regions from DNA sequences using FASTA and GFF files as input.

Menu - (UR-Extractor -h):

StORF-Finder:

Subpackage to extract StORFs from Fasta sequences - Works directly with the output of UR-Extractor.

Menu - (StORF-Finder -h):

StORF-Extractor

Menu - (StORF-Extractor -h):

StORF-Remover

Menu - (StORF-Remover -h):

Test Datasets:

The directory 'Test_Datasets' contains GFF and FASTA files to test the installation and use of StORF-Reporter - Example output files are also provided for comparison.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes