Skip to main content

A package for identifying functional association networks by phylogenetic profiling of prokaryotic genomes.

Project description

PPNet

Introduction

  • What is PPNet? PPNet is designed to uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial association networks of a single species.

Installation

PPNet has the following dependencies:

  • prokka
  • roary
  • Python(>=version 3.7)
  • Python modules:
    • biopython
    • pyvis
    • numpy
    • scipy
    • statsmodels
    • kneed
    • pyani
  • Install with the source codes
    • Download the source codes:
      git clone https://github.com/liyangjie/PPNet.git
      
    • Rename the main program and add the path to the environment variable:
      # Rename PPNet.py to PPNet
      mv PPNet/bin/ppnet.py PPNet/bin/ppnet
      # Give the scripts executable permission
      chmod +x PPNet/bin/*
      # Add the path to the environment variable
      echo export PATH="/Path/to/PPNet/bin:$PATH" >> ~/.bashrc
      source ~/.bashrc
      
    • Install the Python dependencies:
      pip install biopython pyvis numpy scipy statsmodels pyani
      
    • Install the external dependances either from source or from your packaging system:
      prokka roary
      

Usage

ppnet [Options]
Options:
      [-h] show this help message and exit
      [-i1] [Required] The path of input genomes
      [-i2] [Required] The path of phenotype (e.g., pathogenic or non-pathogenic) of all strains
      [-o] The path of output (Default "./PPNet_output")
      [-x] The suffix of genomes data (Default "fasta")
      [-c] number of CPUs to use
      [-a] [Required] Select the algorithm for calculating the correlation coefficient [1-81], or set 0 to use all algorithm.
      [-pt] What percentage of interactions will be visualized (Default "1")

Algorithm

See Algorithm.docx

Examples

ppnet -i1 PATH/to/your/genomes/ -i2 group.csv -x fasta -c 4 -a 1

Input

The genome file should be in fasta format and placed in the same path. The group.csv

Output

  • PPNet_output/HQ_data/*: High quality genomes which with N50 > 10000;
  • PPNet_output/NR_data/*: Non-redundant genome sets after deduplication;
  • PPNet_output/Prokka_result/*: The result files of Prokka
  • PPNet_output/Gff_file/*: Include the GFF file extracted from the prokka_result folder with the input file for roary
  • PPNet_output/Roary_result/*: Result files generated by roary
  • PPNet_output/Roary_result/Statistical_test_result.csv: The result of Fisher's exact test for the distribution of each gene, by default, PPNet reports all genes with a adjusted p-value <0.05.
  • PPNet_output/Roary_result/filted_phylogenetic_profile.csv: The phylogenetic profile of orthologs with significantly different distributions.
  • PPNet_output/Roary_result/netwrok_result_method_x.csv: List the association coefficient calculated by algorithm x between each pair of genes.
  • PPNet_output/Gene_net_x.html: A network plot inferred by algorithm x that can be opened with a browser(Google Chrome,Microsoft Edge etc.).By default, only first percent of interactions were visualized.

License

PPNet is free software under a GPLv3 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ppnet-1.0.2-py3-none-any.whl (25.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page