Skip to main content

Replication Cycle Detector for Phages

Project description

Replidec: Replication Cycle Detector for Phages

PyPI Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

Aim

Use bayes classifier combine with homology search to predict virus replication cycle

Install

Method 1: using Conda

conda create -n replidec
conda activate replidec
conda install -c denglab -c conda-forge -c bioconda replidec

Method 2: using Docker

docker pull denglab/replidec

If you want to use Replidec on an HPC, singularity is recommended. You can create a singularity image using following command,

singularity pull replidec.sif docker://denglab/replidec

Method 3: using pip

If you install using pip, please make sure that mmseqs, hmmsearch and blastp is set to $PATH, these software can equal or higher than version list below

  • MMseqs2 Version: 13.45111

  • HMMER 3.3.2 (Nov 2020)

  • Protein-Protein BLAST 2.5.0+

pip3 install Replidec

Usage: Overview

Replidec [-h] [--version] -p {multiSeqAsOne,batch,multiSeqEachAsOne}
         [-i INPUT_FILE] [-w WORKDIR] [-s SUMMARY] [-t THREADS] [-c HMMER_CRETERIA] [-H HMMER_PARAMETER] [-m MMSEQS_CRETERIA]
         [-M MMSEQS_PARAMETER] [-b BLASTP_CRETERIA] [-B BLASTP_PARAMETER] [-d]

Usage: database

Database used in Replidec will be download automatically.

Location: will be download at the where Replidec installed

If you want to redownload the database, -d parameter can be used. The older database will be mv to "discarded_db" in the workdir(-w); This dir can be removed manually by user.

Usage: Input(-i) and Propgram(-p)

Input file is different base on different program

Replidec cantain 3 different program:

  1. 'multiSeqAsOne'
  2. 'batch'
  3. 'multiSeqEachAsOne',

multiSeqAsOne

  • multiSeqAsOne mode: input is a plain text file contain two coloumn (seprator must be tab)

    • first column: sample name; this will be used as identfier in the output summary file

    • second column: path of the genome or contig file from one virues (Each file can contain multi seq)

    • Example: test/example/genome_test.small.index

    seq1    example/genome_test/genome.test.fnaaa
    seq2    example/genome_test/genome.test.fnaab
    seq3    example/genome_test/genome.test.fnaac
    

multiSeqEachAsOne

  • multiSeqEachAsOne mode: input is a sequence file and treat each seqence as from one virus and give each sequence a predict result;

    • This mode will treat each sequence independently

    • Example: test/example/test.contig.small.fa

batch

  • batch mode: input is a plain text file contain two coloumn (seprator must be tab);

    • first column: sample name;

    • second column: path of the protein file from one virues;

    • Example: test/example/example.small.list

    simulate_art_sample1.10 example/simulate_art_sample1.10.faa
    simulate_art_sample1.11 example/simulate_art_sample1.11.faa
    simulate_art_sample1.12 example/simulate_art_sample1.12.faa
    

Usage: Output(-w and -s)

The output dirname can use -w to set and the name of summary file can use -s to set. Under output dir serveral dir and a summary file will be generated

  • BC_Inno: This dir contain the result file for dectect Innovirues
  • BC_mmseqs: This dir contain the result file for mapping result to our custom database
  • BC_pfam: This dir contain the result file for dectect the Integrase and Excisionase
  • BC_prodigal: This dir contain the result file for CDS prediction from genome or contig sequence. (-p batch will not generate this dir)
  • BC_predict.summary: This file is the summary file of the predict result. It contain multiple coloumns.
    • sample_name: identifier. Can be sequence id or first coloumn the plain text input file.

    • integrase_number: the number of genes mapped to integrase meet the creteria(set by -c).

    • excisionase_number: the number of genes mapped to excisionase meet the creteria(set by -c).

    • pfam_label: if contain integrase or excisionase, label will be "Temperate". otherwise "Virulent".

    • bc_temperate: conditional probability of temperate|genes.

    • bc_virulent: conditional probability of virulent|genes.

    • bc_label: if bc_temperate greater than bc_virulent, label will be "Temperate". otherwise "Virulent".

    • final_label: if pfam_label and bc_label both is Temperate, then label will be "Temperate"; if Innovirues marker gene exist, then label will be "Chronic"; otherwise "Virulent".

    • match_gene_number: the number of genes mapped to our custom databse.

    • path: path of input faa file

Example

## test passed - multiSeqAsOne
Replidec -p multiSeqAsOne -i example/genome_test.small.index -w multiSeqAsOne

## test passed - multiSeqEachAsOne
Replidec -p multiSeqEachAsOne -i example/test.contig.small.fa -w multiSeqEachAsOne

## test passed - batch
Replidec -p batch -i example/example.small.list -w batch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Replidec-0.3.1.1.tar.gz (640.4 kB view details)

Uploaded Source

Built Distribution

Replidec-0.3.1.1-py2.py3-none-any.whl (16.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file Replidec-0.3.1.1.tar.gz.

File metadata

  • Download URL: Replidec-0.3.1.1.tar.gz
  • Upload date:
  • Size: 640.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.10

File hashes

Hashes for Replidec-0.3.1.1.tar.gz
Algorithm Hash digest
SHA256 9152f2fa410b8206a23f5af5ac94747c1e876a7b32ac67ec9bea8d9faa5bd5d4
MD5 8ae00e1647197d998d81310242354ff6
BLAKE2b-256 312aeb6b903693c8bfc74086ca72976d700b73e9990058b24daecc0435f49f41

See more details on using hashes here.

File details

Details for the file Replidec-0.3.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: Replidec-0.3.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.10

File hashes

Hashes for Replidec-0.3.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5467b8119aef7f8206cc9d867845e6f85cd685347779aeaa615aea913c142149
MD5 9023555eb180363da53c76874d13f943
BLAKE2b-256 825aa50b1851c1e5d15b44489cda9a40ce68b61e09f6b0b9d4d96abd7b0d4525

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page