PGAP2: a comprehensive pan-genome analysis pipeline for prokaryotic genomes
Project description
Citation
Please cite me if PGAP2 helped you in any way:
Bu, C., Zhang, H., Zhang, F. et al. PGAP2: A comprehensive toolkit for prokaryotic pan-genome analysis based on fine-grained feature networks. Nat Commun 16, 9865 (2025). https://doi.org/10.1038/s41467-025-64846-5
In Brief
PGAP2 (Pan-Genome Analysis Pipeline 2) is an ultra-fast and comprehensive toolkit for prokaryotic pan-genome analysis. Powered by a Fine-Grained Feature Network, PGAP2 can construct a pan-genome map from 1,000 genomes within 20 minutes while ensuring high accuracy. In addition, it offers a rich set of upstream quality control modules and downstream analysis tools to support common pan-genome analyses.
Quick start
Basic usage
The input directory contains all the genome and annotation files.
PGAP2 supports multiple input formats: GFF files in the same format as those output by Prokka, GFF files with their corresponding genome FASTA files in separate files, GenBank flat files (GBFF), or just genome FASTA files (with --annot required).
Different formats of input files can be mixed in one input directory. PGAP2 will recognize and process them based on their prefixes and suffixes.
pgap2 main -i inputdir/ -o outputdir/
Preprocessing
Quality checks and visualization are conducted by PGAP2 during the preprocessing step. PGAP2 generates an interactive HTML file and corresponding vector figures to help users understand their input data. The input data and pre-alignment results are stored as a pickle file for quick restarting of the same calculation step.
pgap2 prep -i inputdir/ -o outputdir/
Postprocessing
The postprocessing pipeline is performed by PGAP2. There are various submodules integrated into the postprocessing module, such as statistical analysis, single-copy tree building, population clustering, and Tajima's D test. Regardless of which submodule you want to use, you can always run it as follows:
pgap2 post [submodule] [options] -i inputdir/ -o outputdir/
The inputdir is the outputdir of main module.
PGAP2 also support statistical analysis using a PAV file indepandently:
pgap2 post profile --pav your_pav_file -o outputdir/
Installation
The best way to install full version of PGAP2 package is using conda:
conda create -n pgap2 -c bioconda pgap2
alternatively it is often faster to use the mamba solver (Recommended)
conda create -n pgap2 mamba
conda activate pgap2
mamba install -c bioconda pgap2
Or sometimes you only want to carry out a specific function, such as partioning and don't want install too many extra softwares for fully version of PGAP2, then you can just install PGAP2:
pip install pgap2
Or via source file to get the latest version:
git clone https://github.com/bucongfan/PGAP2
pip install -e PGAP2/
And then install extra software that only necessary for a specific function by yourself.
Dependencies of PGAP2 are list below, and PGAP2 will check them whether in environment path or in pgap2/dependencies folder.
Preprocessing
Main
- One of clustering software
- mcl
- One of alignment software
- Using
--retrieveto retrieve missing gene loci - Using
--reannotto re-annotate your genome
Postprocessing
- One of MSA software
- ClipKIT
- One of phylogenetic tree construction software
- ClonalFrameML
- maskrc-svg
- fastbaps
Visulization in Preprocessing and Postprocessing modules
PGAP2 will call Rscript in your environment virable. The library should have:
- ggpubr
- ggrepel
- dplyr
- tidyr
- patchwork
- optparse
Detailed documentation
Please refer documentation from wiki.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pgap2-2.0.tar.gz.
File metadata
- Download URL: pgap2-2.0.tar.gz
- Upload date:
- Size: 6.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc9623e4c45b2526e4b58e89638b1cca1fc0aa2f72e739afbcf876af0f88df80
|
|
| MD5 |
5d64e2e3e2c1c739ad906329e322c7e6
|
|
| BLAKE2b-256 |
50408e9fd9642590d56b97d4287cfb8d5c4ab261f87bbb92defdba3858da5057
|
Provenance
The following attestation bundles were made for pgap2-2.0.tar.gz:
Publisher:
python-publish.yml on bucongfan/PGAP2
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pgap2-2.0.tar.gz -
Subject digest:
dc9623e4c45b2526e4b58e89638b1cca1fc0aa2f72e739afbcf876af0f88df80 - Sigstore transparency entry: 1049540091
- Sigstore integration time:
-
Permalink:
bucongfan/PGAP2@6c466c5a68affbffd2cd82215d6c393b2075ef14 -
Branch / Tag:
refs/tags/v2.0 - Owner: https://github.com/bucongfan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6c466c5a68affbffd2cd82215d6c393b2075ef14 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pgap2-2.0-py3-none-any.whl.
File metadata
- Download URL: pgap2-2.0-py3-none-any.whl
- Upload date:
- Size: 6.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5989b745c1412a7b372222bc77f5a8fdd5fbddedf49a43deb1c5d7a1b02c03b7
|
|
| MD5 |
a931967dbb26d7f1e967c5d841b6c7a9
|
|
| BLAKE2b-256 |
fde3db11076c949ea21417a88b70ba112e9987c179ee5065ca428a379a1b6cfd
|
Provenance
The following attestation bundles were made for pgap2-2.0-py3-none-any.whl:
Publisher:
python-publish.yml on bucongfan/PGAP2
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pgap2-2.0-py3-none-any.whl -
Subject digest:
5989b745c1412a7b372222bc77f5a8fdd5fbddedf49a43deb1c5d7a1b02c03b7 - Sigstore transparency entry: 1049540125
- Sigstore integration time:
-
Permalink:
bucongfan/PGAP2@6c466c5a68affbffd2cd82215d6c393b2075ef14 -
Branch / Tag:
refs/tags/v2.0 - Owner: https://github.com/bucongfan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6c466c5a68affbffd2cd82215d6c393b2075ef14 -
Trigger Event:
release
-
Statement type: