Replication Cycle Detector for Phages
Project description
Replidec: Replication Cycle Detector for Phages
Aim
Use bayes classifier combine with homology search to predict virus replication cycle
Install
Method 1: using Conda
conda create -n replidec
conda activate replidec
conda install -c denglab -c conda-forge -c bioconda replidec
Method 2: using Docker
docker pull denglab/replidec
If you want to use Replidec
on an HPC, singularity is recommended. You can create a singularity image using following command,
singularity pull replidec.sif docker://denglab/replidec
Method 3: using pip
If you install using pip, please make sure that mmseqs
, hmmsearch
and blastp
is set to $PATH, these software can equal or higher than version list below
-
MMseqs2 Version: 13.45111
-
HMMER 3.3.2 (Nov 2020)
-
Protein-Protein BLAST 2.5.0+
pip3 install Replidec
Usage: Overview
Replidec [-h] [--version] -p {multiSeqAsOne,batch,multiSeqEachAsOne}
[-i INPUT_FILE] [-w WORKDIR] [-s SUMMARY] [-t THREADS] [-c HMMER_CRETERIA] [-H HMMER_PARAMETER] [-m MMSEQS_CRETERIA]
[-M MMSEQS_PARAMETER] [-b BLASTP_CRETERIA] [-B BLASTP_PARAMETER] [-d]
Usage: database
Database used in Replidec will be download automatically.
Location: will be download at the where Replidec installed
If you want to redownload the database, -d
parameter can be used. The older database will be mv to "discarded_db" in the workdir(-w); This dir can be removed manually by user.
Usage: Input(-i) and Propgram(-p)
Input file is different base on different program
Replidec cantain 3 different program:
- 'multiSeqAsOne'
- 'batch'
- 'multiSeqEachAsOne',
multiSeqAsOne
-
multiSeqAsOne mode: input is a plain text file contain two coloumn (seprator must be tab)
-
first column: sample name; this will be used as identfier in the output summary file
-
second column: path of the genome or contig file from one virues (Each file can contain multi seq)
-
Example: test/example/genome_test.small.index
seq1 example/genome_test/genome.test.fnaaa seq2 example/genome_test/genome.test.fnaab seq3 example/genome_test/genome.test.fnaac
-
multiSeqEachAsOne
-
multiSeqEachAsOne mode: input is a sequence file and treat each seqence as from one virus and give each sequence a predict result;
-
This mode will treat each sequence independently
-
Example: test/example/test.contig.small.fa
-
batch
-
batch mode: input is a plain text file contain two coloumn (seprator must be tab);
-
first column: sample name;
-
second column: path of the protein file from one virues;
-
Example: test/example/example.small.list
simulate_art_sample1.10 example/simulate_art_sample1.10.faa simulate_art_sample1.11 example/simulate_art_sample1.11.faa simulate_art_sample1.12 example/simulate_art_sample1.12.faa
-
Usage: Output(-w and -s)
The output dirname can use -w
to set and the name of summary file can use -s
to set.
Under output dir serveral dir and a summary file will be generated
- BC_Inno: This dir contain the result file for dectect Innovirues
- BC_mmseqs: This dir contain the result file for mapping result to our custom database
- BC_pfam: This dir contain the result file for dectect the Integrase and Excisionase
- BC_prodigal: This dir contain the result file for CDS prediction from genome or contig sequence. (-p batch will not generate this dir)
- BC_predict.summary: This file is the summary file of the predict result. It contain multiple coloumns.
-
sample_name: identifier. Can be sequence id or first coloumn the plain text input file.
-
integrase_number: the number of genes mapped to integrase meet the creteria(set by -c).
-
excisionase_number: the number of genes mapped to excisionase meet the creteria(set by -c).
-
pfam_label: if contain integrase or excisionase, label will be "Temperate". otherwise "Virulent".
-
bc_temperate: conditional probability of temperate|genes.
-
bc_virulent: conditional probability of virulent|genes.
-
bc_label: if bc_temperate greater than bc_virulent, label will be "Temperate". otherwise "Virulent".
-
final_label: if pfam_label and bc_label both is Temperate, then label will be "Temperate"; if Innovirues marker gene exist, then label will be "Chronic"; otherwise "Virulent".
-
match_gene_number: the number of genes mapped to our custom databse.
-
path: path of input faa file
-
Example
## test passed - multiSeqAsOne
Replidec -p multiSeqAsOne -i example/genome_test.small.index -w multiSeqAsOne
## test passed - multiSeqEachAsOne
Replidec -p multiSeqEachAsOne -i example/test.contig.small.fa -w multiSeqEachAsOne
## test passed - batch
Replidec -p batch -i example/example.small.list -w batch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Replidec-0.3.1.1.tar.gz
.
File metadata
- Download URL: Replidec-0.3.1.1.tar.gz
- Upload date:
- Size: 640.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9152f2fa410b8206a23f5af5ac94747c1e876a7b32ac67ec9bea8d9faa5bd5d4 |
|
MD5 | 8ae00e1647197d998d81310242354ff6 |
|
BLAKE2b-256 | 312aeb6b903693c8bfc74086ca72976d700b73e9990058b24daecc0435f49f41 |
File details
Details for the file Replidec-0.3.1.1-py2.py3-none-any.whl
.
File metadata
- Download URL: Replidec-0.3.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5467b8119aef7f8206cc9d867845e6f85cd685347779aeaa615aea913c142149 |
|
MD5 | 9023555eb180363da53c76874d13f943 |
|
BLAKE2b-256 | 825aa50b1851c1e5d15b44489cda9a40ce68b61e09f6b0b9d4d96abd7b0d4525 |