Skip to main content

Replication Cycle Decipher for Phages

Project description

Replidec: Replication Cycle Decipher for Phages

PyPI Anaconda-Server Badge Anaconda-Server Badge

Aim

Use bayes classifier combine with homology search to predict virus replication cycle

Install

Method 1: using Conda (Recommend using bioconda with latest version)

conda create -n replidec
conda activate replidec
conda install -c conda-forge -c bioconda replidec
or
conda install -c denglab -c conda-forge -c bioconda replidec

Method 2: using Docker

docker pull quay.io/biocontainers/replidec:0.3.4--pyhdfd78af_0 
docker run quay.io/biocontainers/replidec:0.3.4--pyhdfd78af_0 Replidec -h
## Example
docker run -v /your/host/data:/data/ quay.io/biocontainers/replidec:0.3.4--pyhdfd78af_0 Replidec -i data/your_inputfile -p multiSeqEachAsOne -w data

Method 3: using pip

If you install using pip, please make sure that mmseqs, hmmsearch and blastp is set to $PATH, these software can equal or higher than version list below

  • MMseqs2 Version: 13.45111

  • HMMER 3.3.2 (Nov 2020)

  • Protein-Protein BLAST 2.5.0+

pip3 install Replidec

Usage: Overview

Replidec, Replication cycle prediction tool for prokaryotic viruses

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -p , --program        { multi_fasta | genome_table | protein_table }
                        
                        multi_fasta mode:
                        input is a fasta file and treat each sequence as one virus
                        
                        genome_table mode:
                        input is a tab separated file with two columns
                        ___1st column: sample name
                        ___2nd column: path to the genome sequence file of the virus
                        
                        protein_table mode:
                        input is a tab separated file with two columns
                        ___1st column: sample name
                        ___2nd column: path to the protein file of the virus
                        
  -i , --input_file     The input file, which can be a sequence file or an index table
  -w , --work_dir       Directory to store intermediate and final results (default = ./Replidec_results)
  -n , --file_name      Name of final summary file (default = prediction_summary.tsv)
  -t , --threads        Number of parallel threads (default = 10)
  -e , --hmmer_Eval     E-value threshold to filter hmmer result (default = 1e-5)
  -E , --hmmer_parameters 
                        Parameters used for hmmer (default = --noali --cpu 3)
  -m , --mmseq_Eval     E-value threshold to filter mmseqs2 result (default = 1e-5)
  -M , --mmseq_parameters 
                        Parameter used for mmseqs
                        (default = -s 7 --max-seqs 1 --alignment-mode 3 --alignment-output-mode 0 --min-aln-len 40 --cov-mode 0 --greedy-best-hits 1 --threads 3)
  -b , --blastp_Eval    E-value threshold to filter blast result (default =1e-5)
  -B , --blastp_parameter 
                        Parameters used for blastp (default = -num_threads 3)
  -d, --db_redownload   Remove and re-download database

Usage: Download database (-d)

Database used in Replidec will be download automatically.

Location: will be download at the where Replidec installed

If you want to redownload the database, -d parameter can be used. The older database will be mv to "discarded_db" in the workdir(-w); This dir can be removed manually by user.

Usage: Input (-i) and Propgram (-p)

Input file is different base on different program

Replidec cantain 3 different program:

  1. 'multi_fasta'
  2. 'genome_table'
  3. 'protein_table',

multi_fasta mode:

  • input is a fasta file and treat each sequence as one virus.
    • Example: <your_path>/viral_contigs.fasta

      >contig_1
      TATCGATCGATCGATCGATCGATCGTACGTACGTACGTACG...
      >contig_2
      CATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG...
      ...
      

genome_table mode:

  • input is a tab separated file with two columns.

    • 1st column: sample name
    • 2nd column: path to the genome sequence file of the virus
    • Example: <your_path>/example_genomes.tsv
    contig_1    your/file/path/contig_1.fasta
    contig_2    your/file/path/contig_2.fasta
    contig_3    your/file/path/contig_3.fasta
    ...
    

protein_table mode:

  • input is a tab separated file with two columns

    • 1st column: sample name
    • 2nd column: path to the protein file of the virus
    • Example: <your_path>/example_proteins.tsv
    contig_1_prot	your/file/path/contig_1.fasta
    contig_2_prot	your/file/path/contig_2.fasta
    contig_3_prot   your/file/path/contig_3.fasta
    ...
    

Usage: Output (-w and -n)

The output directory can be assigned with -w , --work_dir where the intermidiate files and the final prediction results will be stored. The name of the final summary file can be assigned with -n , --file_name argument.

At the end of the analysis, the output directory would contain the following:

  • BC_Inno: This directory contains the result file for dectect Innovirues
  • BC_mmseqs: This directory contains the result file for mapping result to our custom database
  • BC_pfam: This directory contains the result file for dectect the Integrase and Excisionase
  • BC_prodigal: This directory contains the result file for CDS prediction from genome or contig sequence. (if {-p protein_table} is used, this directory will not be created)
  • prediction_summary.tsv: This file is the summary file of the predict result. It contain multiple coloumns.
    • sample_name: identifier. Can be sequence id or first coloumn the plain text input file.

    • integrase_number: the number of genes mapped to integrase meet the creteria(set by -c).

    • excisionase_number: the number of genes mapped to excisionase meet the creteria(set by -c).

    • pfam_label: if contain integrase or excisionase, label will be "Temperate". otherwise "Virulent".

    • bc_temperate: conditional probability of temperate|genes.

    • bc_virulent: conditional probability of virulent|genes.

    • bc_label: if bc_temperate greater than bc_virulent, label will be "Temperate". otherwise "Virulent".

    • final_label: if pfam_label and bc_label both is Temperate, then label will be "Temperate"; if Innovirues marker gene exist, then label will be "Chronic"; otherwise "Virulent".

    • match_gene_number: the number of genes mapped to our custom databse.

    • path: path of input faa file

Example


## test passed - multi_fasta mode
Replidec -p multi_fasta -i my/path/test_viral_contigs.fasta -w my/path/replidec_test_VC_results

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replidec-0.3.5.tar.gz (699.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

replidec-0.3.5-py2.py3-none-any.whl (15.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file replidec-0.3.5.tar.gz.

File metadata

  • Download URL: replidec-0.3.5.tar.gz
  • Upload date:
  • Size: 699.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for replidec-0.3.5.tar.gz
Algorithm Hash digest
SHA256 c84212d3000e399d203f0a87e91381aa6b39482b44d53b069fbb0754a88ddfde
MD5 1d92f3f073dc545e8ed2a963d934cfbb
BLAKE2b-256 37cce5228829c0ebab77c0fc45b0a7ee43b22d6cd0d4a14d48d3ec5aa6718a63

See more details on using hashes here.

Provenance

The following attestation bundles were made for replidec-0.3.5.tar.gz:

Publisher: publish-to-pypi.yml on pengSherryYel/Replidec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file replidec-0.3.5-py2.py3-none-any.whl.

File metadata

  • Download URL: replidec-0.3.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for replidec-0.3.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 04ded0a4e8ded570d2f39d09ee482369bebfc49d06e1d757effed665329cbab4
MD5 c6c9cdfcd3acdc79e68ae82944004232
BLAKE2b-256 5f78e49941ee01bbde41e96ef39d4cd6dd4e9ced33bf265f390921caf18b5dc8

See more details on using hashes here.

Provenance

The following attestation bundles were made for replidec-0.3.5-py2.py3-none-any.whl:

Publisher: publish-to-pypi.yml on pengSherryYel/Replidec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page