Replication Cycle Decipher for Phages
Project description
Replidec: Replication Cycle Decipher for Phages
Aim
Use bayes classifier combine with homology search to predict virus replication cycle
Install
Method 1: using Conda (Recommend using bioconda with latest version)
conda create -n replidec
conda activate replidec
conda install -c conda-forge -c bioconda replidec
or
conda install -c denglab -c conda-forge -c bioconda replidec
Method 2: using Docker
docker pull quay.io/biocontainers/replidec:0.3.4--pyhdfd78af_0
docker run quay.io/biocontainers/replidec:0.3.4--pyhdfd78af_0 Replidec -h
## Example
docker run -v /your/host/data:/data/ quay.io/biocontainers/replidec:0.3.4--pyhdfd78af_0 Replidec -i data/your_inputfile -p multiSeqEachAsOne -w data
Method 3: using pip
If you install using pip, please make sure that mmseqs, hmmsearch and blastp is set to $PATH, these software can equal or higher than version list below
-
MMseqs2 Version: 13.45111
-
HMMER 3.3.2 (Nov 2020)
-
Protein-Protein BLAST 2.5.0+
pip3 install Replidec
Usage: Overview
Replidec, Replication cycle prediction tool for prokaryotic viruses
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-p , --program { multi_fasta | genome_table | protein_table }
multi_fasta mode:
input is a fasta file and treat each sequence as one virus
genome_table mode:
input is a tab separated file with two columns
___1st column: sample name
___2nd column: path to the genome sequence file of the virus
protein_table mode:
input is a tab separated file with two columns
___1st column: sample name
___2nd column: path to the protein file of the virus
-i , --input_file The input file, which can be a sequence file or an index table
-w , --work_dir Directory to store intermediate and final results (default = ./Replidec_results)
-n , --file_name Name of final summary file (default = prediction_summary.tsv)
-t , --threads Number of parallel threads (default = 10)
-e , --hmmer_Eval E-value threshold to filter hmmer result (default = 1e-5)
-E , --hmmer_parameters
Parameters used for hmmer (default = --noali --cpu 3)
-m , --mmseq_Eval E-value threshold to filter mmseqs2 result (default = 1e-5)
-M , --mmseq_parameters
Parameter used for mmseqs
(default = -s 7 --max-seqs 1 --alignment-mode 3 --alignment-output-mode 0 --min-aln-len 40 --cov-mode 0 --greedy-best-hits 1 --threads 3)
-b , --blastp_Eval E-value threshold to filter blast result (default =1e-5)
-B , --blastp_parameter
Parameters used for blastp (default = -num_threads 3)
-d, --db_redownload Remove and re-download database
Usage: Download database (-d)
Database used in Replidec will be download automatically.
Location: will be download at the where Replidec installed
If you want to redownload the database, -d parameter can be used. The older database will be mv to "discarded_db" in the workdir(-w); This dir can be removed manually by user.
Usage: Input (-i) and Propgram (-p)
Input file is different base on different program
Replidec cantain 3 different program:
- 'multi_fasta'
- 'genome_table'
- 'protein_table',
multi_fasta mode:
- input is a fasta file and treat each sequence as one virus.
-
Example: <your_path>/viral_contigs.fasta
>contig_1 TATCGATCGATCGATCGATCGATCGTACGTACGTACGTACG... >contig_2 CATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG... ...
-
genome_table mode:
-
input is a tab separated file with two columns.
- 1st column: sample name
- 2nd column: path to the genome sequence file of the virus
- Example: <your_path>/example_genomes.tsv
contig_1 your/file/path/contig_1.fasta contig_2 your/file/path/contig_2.fasta contig_3 your/file/path/contig_3.fasta ...
protein_table mode:
-
input is a tab separated file with two columns
- 1st column: sample name
- 2nd column: path to the protein file of the virus
- Example: <your_path>/example_proteins.tsv
contig_1_prot your/file/path/contig_1.fasta contig_2_prot your/file/path/contig_2.fasta contig_3_prot your/file/path/contig_3.fasta ...
Usage: Output (-w and -n)
The output directory can be assigned with -w , --work_dir where the intermidiate files and the final prediction results will be stored.
The name of the final summary file can be assigned with -n , --file_name argument.
At the end of the analysis, the output directory would contain the following:
- BC_Inno: This directory contains the result file for dectect Innovirues
- BC_mmseqs: This directory contains the result file for mapping result to our custom database
- BC_pfam: This directory contains the result file for dectect the Integrase and Excisionase
- BC_prodigal: This directory contains the result file for CDS prediction from genome or contig sequence. (if {-p protein_table} is used, this directory will not be created)
- prediction_summary.tsv: This file is the summary file of the predict result. It contain multiple coloumns.
-
sample_name: identifier. Can be sequence id or first coloumn the plain text input file.
-
integrase_number: the number of genes mapped to integrase meet the creteria(set by -c).
-
excisionase_number: the number of genes mapped to excisionase meet the creteria(set by -c).
-
pfam_label: if contain integrase or excisionase, label will be "Temperate". otherwise "Virulent".
-
bc_temperate: conditional probability of temperate|genes.
-
bc_virulent: conditional probability of virulent|genes.
-
bc_label: if bc_temperate greater than bc_virulent, label will be "Temperate". otherwise "Virulent".
-
final_label: if pfam_label and bc_label both is Temperate, then label will be "Temperate"; if Innovirues marker gene exist, then label will be "Chronic"; otherwise "Virulent".
-
match_gene_number: the number of genes mapped to our custom databse.
-
path: path of input faa file
-
Example
## test passed - multi_fasta mode
Replidec -p multi_fasta -i my/path/test_viral_contigs.fasta -w my/path/replidec_test_VC_results
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file replidec-0.3.5.tar.gz.
File metadata
- Download URL: replidec-0.3.5.tar.gz
- Upload date:
- Size: 699.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c84212d3000e399d203f0a87e91381aa6b39482b44d53b069fbb0754a88ddfde
|
|
| MD5 |
1d92f3f073dc545e8ed2a963d934cfbb
|
|
| BLAKE2b-256 |
37cce5228829c0ebab77c0fc45b0a7ee43b22d6cd0d4a14d48d3ec5aa6718a63
|
Provenance
The following attestation bundles were made for replidec-0.3.5.tar.gz:
Publisher:
publish-to-pypi.yml on pengSherryYel/Replidec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
replidec-0.3.5.tar.gz -
Subject digest:
c84212d3000e399d203f0a87e91381aa6b39482b44d53b069fbb0754a88ddfde - Sigstore transparency entry: 257595067
- Sigstore integration time:
-
Permalink:
pengSherryYel/Replidec@c41109451f87eb37678566f9e28a481a5b99e1bb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/pengSherryYel
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c41109451f87eb37678566f9e28a481a5b99e1bb -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file replidec-0.3.5-py2.py3-none-any.whl.
File metadata
- Download URL: replidec-0.3.5-py2.py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04ded0a4e8ded570d2f39d09ee482369bebfc49d06e1d757effed665329cbab4
|
|
| MD5 |
c6c9cdfcd3acdc79e68ae82944004232
|
|
| BLAKE2b-256 |
5f78e49941ee01bbde41e96ef39d4cd6dd4e9ced33bf265f390921caf18b5dc8
|
Provenance
The following attestation bundles were made for replidec-0.3.5-py2.py3-none-any.whl:
Publisher:
publish-to-pypi.yml on pengSherryYel/Replidec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
replidec-0.3.5-py2.py3-none-any.whl -
Subject digest:
04ded0a4e8ded570d2f39d09ee482369bebfc49d06e1d757effed665329cbab4 - Sigstore transparency entry: 257595071
- Sigstore integration time:
-
Permalink:
pengSherryYel/Replidec@c41109451f87eb37678566f9e28a481a5b99e1bb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/pengSherryYel
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c41109451f87eb37678566f9e28a481a5b99e1bb -
Trigger Event:
workflow_dispatch
-
Statement type: