Prediction and classification of conopeptides
Project description
ConoDictor: A fast and accurate prediction and classification tool for conopeptides
Introduction
Cone snails are among the richest sources of natural peptides with promising pharmacological and therapeutic applications. With the reduced costs of RNAseq, scientists now heavily rely on venom gland transcriptomes for the mining of novel bioactive conopeptides, but the bioinformatic analyses often hamper the discovery process.
ConoDictor 2 is a standalone and user-friendly command-line program. We have updated the program originally published as a web server 10 years ago using novel and updated tools and algorithms and improved our classification models with new and higher quality sequences. ConoDictor 2 is now more accurate, faster, multiplatform, and able to deal with a whole cone snail venom gland transcriptome (raw reads or contigs) in a very short time.
The only input ConoDictor 2 requires is the assembled transcriptome or the raw reads file either in DNA or amino acid: used alphabet is automatically recognized. ConoDictor 2 run predictions directly on the proteins file (submitted or dynamically generated) and tries to report the longest conopeptide precursor-like sequence.
Installation
Install from Pip
You will have first to install HMMER 3 and Pftools to be able to run conodictor.
pip install conodictor
Using containers
Docker
Accessible at https://hub.docker.com/u/ebedthan or on BioContainers.
docker pull ebedthan/conodictor:latest
docker run ebedthan/conodictor:latest conodictor -h
Example of a run
docker run --rm=True -v $PWD:/data -u $(id -u):$(id -g) ebedthan/conodictor:latest conodictor --out /data/outdir /data/input.fa.gz
See https://staph-b.github.io/docker-builds/run_containers/ for more informations on how to properly run a docker container.
Singularity
The singularity container does not need admin privileges making it suitable for university clusters and HPC.
singularity build conodictor.sif docker://ebedthan/conodictor:latest
singularity exec conodictor.sif conodictor -h
Install from source
# Download ConoDictor development version
git clone https://github.com/koualab/conodictor.git conodictor
# Navigate to directory
cd conodictor
# Install with poetry: see https://python-poetry.org
poetry install --no-dev
# Enter the Python virtual environment with
poetry shell
# Test conodictor is correctly installed
conodictor -h
If you do not want to go into the virtual environment just do:
poetry run conodictor -h
Test
- Type
conodictor -h
and it should output something like:
conodictor [FLAGS/OPTIONS] <file>
Examples:
conodictor file.fa.gz
conodictor --out outfolder --cpus 4 --mlen 51 file.fa
positional arguments:
file Specifies input file.
optional arguments:
-h, --help show this help message and exit
-o OUT, --out OUT Specify output folder.
--mlen MLEN Set the minimum length of the sequence to be
considered as a match
--ndup NDUP Minimum sequence occurence of a sequence to be
considered
--faa Create a fasta file of matched sequences. Default:
False.
--filter Activate the removal of sequences that matches only
the signal and/or proregions. Default: False.
-a, --all Display sequence without hits in output. Default:
False.
-j CPUS, --cpus CPUS Specify the number of threads. Default: 1.
--force Force re-use output directory. Default: Off.
-q, --quiet Decrease program verbosity
--debug Activate debug mode
Invoking conodictor
conodictor file.fa.gz
conodictor --out outfolder --cpus 4 --mlen 51 file.fa
Output files
The comma separeted-values file summary.csv can be easily viewed with any office suite, or text editor.
sequence,hmm_pred,pssm_pred definitive_pred
SEQ_ID_1,A,A,A
SEQ_ID_2,B,D,CONFLICT B and D
SEQ_ID_3,O1,O1,O1
...
Command line options
General:
file Specify input fasta file [required]
Outputs:
-o, --out Specify output folder.
--faa Create a fasta file of matched sequences. Default: False.
-a, --all Display sequence without hits in output. Default: False.
--force Force re-use output directory. Default: Off.
Computation:
-j, --cpus Specify number of threads. Default: 1.
Setup:
-q, --quiet Decrease verbosity
--debug Activate debug mode
Standard meta-options:
--help, -h Print help and exit
Citation
When using ConoDictor2 in your work, you should cite:
Dominique Koua, Anicet Ebou, Sébastien Dutertre, Improved prediction of conopeptide superfamilies with ConoDictor 2.0, Bioinformatics Advances, Volume 1, Issue 1, 2021, vbab011, https://doi.org/10.1093/bioadv/vbab011.
Bugs
Submit problems or requests to the Issue Tracker.
Dependencies
Mandatory
-
HMMER 3
Used for HMM profile prediction.
Eddy SR, Accelerated Profile HMM Searches. PLOS Computational Biology 2011, 10.1371/journal.pcbi.1002195 -
Pftools
Used for PSSM prediction.
Schuepbach P et al. pfsearchV3: a code acceleration and heuristic to search PROSITE profiles. Bioinformatics 2013, 10.1093/bioinformatics/btt129
Licence
For commercial uses please contact Dominique Koua at dominique.koua@inphb.ci.
Authors
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file conodictor-2.3.4.tar.gz
.
File metadata
- Download URL: conodictor-2.3.4.tar.gz
- Upload date:
- Size: 270.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.10 Linux/5.13.0-39-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bafa36a90945564da540935a0636934d1a4f68730669e8f9a395c9a1d6680d1 |
|
MD5 | aea44d39c2f8c729f68cc2691ee25000 |
|
BLAKE2b-256 | bb0791f745b5154165bf205c058bfc80b1180754c5868f964b90affe4259a32d |
File details
Details for the file conodictor-2.3.4-py3-none-any.whl
.
File metadata
- Download URL: conodictor-2.3.4-py3-none-any.whl
- Upload date:
- Size: 274.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.10 Linux/5.13.0-39-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 683201cdc55a7cf3a9b1862a3df002847638676bd8fadcdc0df5ec690e40d0fc |
|
MD5 | 93a9bfb00c831c44393837ef9c20e93f |
|
BLAKE2b-256 | 3627ab781da1d13b82e085e280e282c11c1d5979563c76235fd09752a2b28a41 |