decouphage - A tool to annotate phage genomes.
Project description
Decouphage: the art of decorating a Phage genome by gluing feature cutouts into it.
As the name suggests decouphage is a tool designed to annotate phage genomes. It only external dependency is ncbi-blast+ everything else is optional.
Relevant branches
- main branch: stable version available in pypi and dockerhub.
- dev branch: development branch with new features and bugs.
Table of contents
Highlights
- Can be easily installed in Linux or Mac computers. Only requirement is ncbi-blast+.
- Can be extended with prodigal, but as default it uses phanotate for ORF calling.
- Decouphage is fast, using a Macbook most phage genomes can be annotated in less than a minute.
- Uses ncbi NR database containing non-identical sequences from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF.
- Allow manual curation using the web interface.
Validation
Decouphage validation was made in comparison to RAST(Rapid Annotation using Subsystem Technology), a tool that is often praised for its good Prokaryotic annotation capabilities.
Decouphage outperforms RAST when calling some of the most relevant product categories:
The CDS annotation agreement between Decouphage and RAST is high, reaching up to 94% for some products:
Enzyme | Agreement rate with RAST |
---|---|
endonuclease | 94% |
exonuclease | 58% |
helicase | 70% |
hydrolase | 73% |
kinase | 86% |
ligase | 94% |
methyltransferase | 65% |
polymerase | 76% |
primase | 78% |
protease | 85% |
recombinase | 28% |
reductase | 90% |
synthase | 84% |
terminase | 94% |
transferase | 60% |
A precise comparison of product-to-position is difficult given differences in spelling, typos, synonyms, and interchangeable names, but the table above can give a good idea of the similarities.
To corroborate the surplus of annotations that decouphage achieves, the amount of "hypothetical protein" and "Phage protein" was also checked:
Product | Decouphage | Rast | Agreement rate with RAST |
---|---|---|---|
hypothetical protein | 3945 | 6302 | 53% |
phage protein | 0 | 1626 | N/A1 |
Total products | 9692 | 9692 | N/A2 |
- Decouphage does not include products containing "phage protein" as they usually are a noise source.
- The genbank file generated by RAST was used as input for decouphage to ensure no difference in the number of CDS.
This table shows that Decouphage potentially assigns 2x more meaningful products than RAST when annotating a phage genome.
How can I use decouphage
Options
Usage: decouphage [OPTIONS] INPUT_FILE
Options:
--prodigal Use prodigal for orf calling instead of phanotate.
-d, --db PATH
-o, --output TEXT
-t, --threads INTEGER [default: 1]
--tmpdir TEXT Folder for intermediate files.
--no_orf_calling Annotate CDS from genbank file.
--locus_tag TEXT Locus tag prefix.
--download_db Download default database.
-v, --verbose More verbose logging for debugging purpose.
--help Show this message and exit.
I want to discover and annotate a lot of ORFs
decouphage genome.fasta -o genome.gb
I want to use prodigal to find my genes
decouphage genome.fasta -o genome.gb --prodigal
I have a genbank with poor annotation and want more
In this mode decouphage will reuse the genbank ORFs and just run the annotation procedure.
decouphage genome.gbk -o genome.gb --no-orf-calling
Installation
You have multiple options to install and run decouphage:
Ubuntu
Install decouphage:
pip install decouphage
(Required) Install ncbi-blast+
apt install ncbi-blast+
(Optional) Install dependencies:
apt install prodigal trnascan-se
Docker
Run with docker (Already includes dependencies and databases):
docker run decouphage/decouphage
Databases
Decouphage database is derived from NCBI NR database clustered at 90% identity and 90% sequence length.
Downloading database
Download database to default location in $HOME/.decouphage/db/
decouphage --download_db
Making custom databases
Make blast database
makeblastdb -in database.fa -parse_seqids -blastdb_version 5 -dbtype prot
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file decouphage-0.0.3.tar.gz
.
File metadata
- Download URL: decouphage-0.0.3.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67c0efcace0207647aab51926435936699b14f9e2bb678ad089449e286d38df6 |
|
MD5 | 31be241b9c7d81d9845f8612da05b014 |
|
BLAKE2b-256 | 075dd2671e51b1ace70707e8adb6de8f2de74cf3263ea9f025e2e484e7898f09 |