Skip to main content

PGAP2: a comprehensive pan-genome analysis pipeline for prokaryotic genomes

Project description

image

Citation

Please cite me if PGAP2 helped you in any way:

Bu, C., Zhang, H., Zhang, F. et al. PGAP2: A comprehensive toolkit for prokaryotic pan-genome analysis based on fine-grained feature networks. Nat Commun 16, 9865 (2025). https://doi.org/10.1038/s41467-025-64846-5

In Brief

PGAP2 (Pan-Genome Analysis Pipeline 2) is an ultra-fast and comprehensive toolkit for prokaryotic pan-genome analysis. Powered by a Fine-Grained Feature Network, PGAP2 can construct a pan-genome map from 1,000 genomes within 20 minutes while ensuring high accuracy. In addition, it offers a rich set of upstream quality control modules and downstream analysis tools to support common pan-genome analyses.

Quick start

Basic usage

The input directory contains all the genome and annotation files.

PGAP2 supports multiple input formats: GFF files in the same format as those output by Prokka, GFF files with their corresponding genome FASTA files in separate files, GenBank flat files (GBFF), or just genome FASTA files (with --annot required).

Different formats of input files can be mixed in one input directory. PGAP2 will recognize and process them based on their prefixes and suffixes.

pgap2 main -i inputdir/ -o outputdir/

Preprocessing

Quality checks and visualization are conducted by PGAP2 during the preprocessing step. PGAP2 generates an interactive HTML file and corresponding vector figures to help users understand their input data. The input data and pre-alignment results are stored as a pickle file for quick restarting of the same calculation step.

pgap2 prep -i inputdir/ -o outputdir/

Postprocessing

The postprocessing pipeline is performed by PGAP2. There are various submodules integrated into the postprocessing module, such as statistical analysis, single-copy tree building, population clustering, and Tajima's D test. Regardless of which submodule you want to use, you can always run it as follows:

pgap2 post [submodule] [options] -i inputdir/ -o outputdir/

The inputdir is the outputdir of main module.

PGAP2 also support statistical analysis using a PAV file indepandently:

pgap2 post profile --pav your_pav_file -o outputdir/

Installation

The best way to install full version of PGAP2 package is using conda:

conda create -n pgap2 -c bioconda pgap2

alternatively it is often faster to use the mamba solver (Recommended)

conda create -n pgap2  mamba
conda activate pgap2 
mamba install -c bioconda pgap2

Or sometimes you only want to carry out a specific function, such as partioning and don't want install too many extra softwares for fully version of PGAP2, then you can just install PGAP2:

pip install pgap2

Or via source file to get the latest version:

git clone https://github.com/bucongfan/PGAP2
pip install -e PGAP2/

And then install extra software that only necessary for a specific function by yourself.

Dependencies of PGAP2 are list below, and PGAP2 will check them whether in environment path or in pgap2/dependencies folder.

Preprocessing

Main

Postprocessing

Visulization in Preprocessing and Postprocessing modules

PGAP2 will call Rscript in your environment virable. The library should have:

  • ggpubr
  • ggrepel
  • dplyr
  • tidyr
  • patchwork
  • optparse

Detailed documentation

Please refer documentation from wiki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgap2-2.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgap2-2.0-py3-none-any.whl (6.5 MB view details)

Uploaded Python 3

File details

Details for the file pgap2-2.0.tar.gz.

File metadata

  • Download URL: pgap2-2.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pgap2-2.0.tar.gz
Algorithm Hash digest
SHA256 dc9623e4c45b2526e4b58e89638b1cca1fc0aa2f72e739afbcf876af0f88df80
MD5 5d64e2e3e2c1c739ad906329e322c7e6
BLAKE2b-256 50408e9fd9642590d56b97d4287cfb8d5c4ab261f87bbb92defdba3858da5057

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgap2-2.0.tar.gz:

Publisher: python-publish.yml on bucongfan/PGAP2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pgap2-2.0-py3-none-any.whl.

File metadata

  • Download URL: pgap2-2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pgap2-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5989b745c1412a7b372222bc77f5a8fdd5fbddedf49a43deb1c5d7a1b02c03b7
MD5 a931967dbb26d7f1e967c5d841b6c7a9
BLAKE2b-256 fde3db11076c949ea21417a88b70ba112e9987c179ee5065ca428a379a1b6cfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgap2-2.0-py3-none-any.whl:

Publisher: python-publish.yml on bucongfan/PGAP2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page