Integration of soil metagenomic data for correlation of microbial markers with plant biochemical indicators

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

vivian.mello

These details have not been verified by PyPI

Project description

PGPTracker: A Bioinformatics Pipeline for Functional Prediction and Analysis

PGPTracker is a command-line interface (CLI) tool designed to automate the complete workflow from 16S rRNA sequencing data to in-depth functional and statistical analysis.

It connects Amplicon Sequence Variants (ASVs) to predicted functions (KEGG Orthologs) and maps them to Plant Growth-Promoting Traits (PGPTs).

Core Workflow

The pipeline is split into two main stages:

Stage 1 (process): Handles data processing (QIIME 2, PICRUSt2) to generate unstratified (Function x Sample) and stratified (Taxon x Function x Sample) abundance tables.
Stage 2 (analysis): Takes the tables from Stage 1 and performs normalization (CLR), statistical analysis (Kruskal-Wallis, PERMANOVA), machine learning (Random Forest, Lasso), and generates publication-quality visualizations (PCA, Heatmaps, Volcano Plots).

Installation

PGPTracker is a pip-installable package that requires Conda to manage its bioinformatics dependencies (QIIME 2 and PICRUSt2).

Step 1: Create and Activate Base Environment

Create and activate a clean Conda environment (Python 3.10+ recommended).

conda create -n pgptracker python=3.13
conda activate pgptracker

Step 2: Install PGPTracker

Install the package and its core dependencies from PyPI.

pip install pgptracker

Step 3: Run Internal Setup (Mandatory)

This command is mandatory. It automatically creates and configures the separate qiime2 and picrust2 Conda environments that PGPTracker needs to run external tools.

pgptracker setup

Quick Start: A Full Example

This example demonstrates the full process and subsequent analysis.

Note: You can also run the command pgptracker -i to enter the interactive mode, which is much more user-friendly.

Step 1: Run Stage 1 (`process`)

Process your raw sequence data (.qza, .fna, or .biom) into PGPT abundance tables. This example generates a table stratified by Genus.

pgptracker process \
    --rep-seqs path/to/dna-sequences.fasta \
    --feature-table path/to/feature-table.biom \
    -o my_project_output \
    --stratified \
    --tax-level Genus

This command will create the file my_project_output/genus_stratified_pgpt.tsv.

Step 2: Run Stage 2 (`analysis`)

Analyze the stratified output against your metadata to find which Genus/Function pairs differ by Treatment.

pgptracker analysis \
    -i my_project_output/genus_stratified_pgpt.tsv \
    -m path/to/my_metadata.tsv \
    -o my_project_output/analysis_by_treatment \
    --input-format long \
    --group-col Treatment \
    --target-col Treatment \
    --ml-type classification

This will create the analysis_by_treatment directory containing plots and machine learning results.

Command Reference

Main Commands

Command	Description
`pgptracker process`	(Stage 1) Runs the full bioinformatics pipeline (QIIME2, PICRUSt2, PGPTs).
`pgptracker analysis`	(Stage 2) Runs statistical tests, ML, and plotting on a Stage 1 output table.
`pgptracker setup`	Installs and configures internal Conda environments. Must be run once after install.
`pgptracker -i`	Runs the tool in a guided, interactive menu-driven mode.

`pgptracker process` (Stage 1) Arguments

Argument	Description
`--rep-seqs`	Path to representative sequences (`.qza` or `.fna`).
`--feature-table`	Path to feature table (`.qza` or `.biom`).
`-o, --output`	Output directory to store results.
`--stratified`	Flag to generate stratified (Taxon x Function x Sample) output.
`--tax-level`	Taxonomic level for stratification (default: `Genus`).
`--max-nsti`	NSTI threshold for PICRUSt2 filtering (default: `1.7`).
`-t, --threads`	Number of threads to use (default: auto-detect).
`--classifier-qza`	Path to a custom QIIME 2 classifier (default: Greengenes 2024.09).

`pgptracker analysis` (Stage 2) Arguments

Argument	Description
`-i, --input-table`	Path to the input table (output from `process`).
`-m, --metadata`	Path to the sample metadata file (TSV format).
`-o, --output-dir`	Directory to save analysis results.
`--group-col`	Metadata column for grouping in plots and statistics (e.g., `'Treatment'`).
`--target-col`	Metadata column to predict in machine learning (e.g., `'pH'` or `'Treatment'`).
`--ml-type`	Type of ML task: `classification` or `regression`.
`--input-format`	Format of the input table: `wide` (unstratified) or `long` (stratified).
`--no-stats`	Flag to skip statistical tests (Kruskal-Wallis/Mann-Whitney).
`--no-ml`	Flag to skip machine learning models.

Example Workflows (Stage 2 Analysis Cookbook)

A. Classification: Predict Environmental Biome

Question: "Can the functional profile distinguish between biomes (e.g., forest vs. desert)?"

pgptracker analysis \
    -i path/to/unstratified_pgpt_Lv3_abundances.tsv \
    -m path/to/emp_metadata.tsv \
    -o results/analysis_biome \
    --feature-col-name Lv3 \
    --group-col env_biome \
    --target-col env_biome \
    --ml-type classification

B. Regression: Correlate with Chemistry (pH)

Question: "Which bacterial functions (PGPTs) are most associated with soil pH?"

pgptracker analysis \
    -i path/to/unstratified_pgpt_Lv3_abundances.tsv \
    -m path/to/emp_metadata.tsv \
    -o results/analysis_ph \
    --feature-col-name Lv3 \
    --group-col env_feature \
    --target-col ph \
    --ml-type regression

Outputs

PGPTracker generates publication-ready outputs in your results folder:

Directory	Content
`normalization/`	Raw and CLR-normalized abundance tables.
`diversity/`	Alpha Diversity plots (Shannon, Simpson), Beta Diversity plots (PCA, t-SNE), and PERMANOVA results.
`statistics/`	Differential Abundance results (Kruskal-Wallis), Volcano Plots, and Clustered Heatmaps.
`machine_learning/`	Feature Importance bar plots (Random Forest / Lasso) and Boruta selection results.

Citing

PGPTracker is built upon the work of many others. Please cite the core tools and databases it uses:

PGPTracker & PLaBAse

Atz, S., Rauh, M., Gautam, A., Huson, D.H. mgPGPT: Metagenomic analysis of plant growth-promoting traits. (submitted, 2024, preprint)
Patz, S., Gautam, A., Becker, M., Ruppel, S., Rodríguez-Palenzuela, P., Huson, D.H. PLaBAse: A comprehensive web resource for analyzing the plant growth-promoting potential of plant-associated bacteria. (submitted 2021, preprint)

Core Dependencies

QIIME 2: Bolyen E, Rideout JR, Dillon MR, et al. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857.
PICRUSt2: Douglas, G.M., Maffei, V.J., Zaneveld, J.R. et al. (2020). PICRUSt2 for prediction of metagenome functions. Nature Biotechnology 38, 685–688.
Greengenes2: McDonald, D., et al. (2024). Greengenes2 unifies microbial data in a single reference tree. Nature Biotechnology.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

vivian.mello

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

Nov 24, 2025

This version

0.1.3

Nov 19, 2025

0.1.2

Nov 18, 2025

0.1.1

Nov 18, 2025

0.1.0

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgptracker-0.1.3.tar.gz (868.5 kB view details)

Uploaded Nov 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pgptracker-0.1.3-py3-none-any.whl (235.0 kB view details)

Uploaded Nov 19, 2025 Python 3

File details

Details for the file pgptracker-0.1.3.tar.gz.

File metadata

Download URL: pgptracker-0.1.3.tar.gz
Upload date: Nov 19, 2025
Size: 868.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pgptracker-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`48d56975de2779bc9092be90945da48fcdb5d5bf78f07b0b9c46e028fae7b18c`
MD5	`a5d3efffc37fc4e0785faeafedc74eeb`
BLAKE2b-256	`68246e97b3aed14fa4a4771078b3aa7ace4254330aaeb0818009e0f0e46b13ab`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgptracker-0.1.3.tar.gz:

Publisher: publish.yml on kiuone/PGPTracker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pgptracker-0.1.3.tar.gz
- Subject digest: 48d56975de2779bc9092be90945da48fcdb5d5bf78f07b0b9c46e028fae7b18c
- Sigstore transparency entry: 708073052
- Sigstore integration time: Nov 19, 2025
Source repository:
- Permalink: kiuone/PGPTracker@e71144e53de0c889afba3e771006c82ac0a3d60e
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/kiuone
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e71144e53de0c889afba3e771006c82ac0a3d60e
- Trigger Event: release

File details

Details for the file pgptracker-0.1.3-py3-none-any.whl.

File metadata

Download URL: pgptracker-0.1.3-py3-none-any.whl
Upload date: Nov 19, 2025
Size: 235.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pgptracker-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`296dd9534021edc9035867981af3536cfec71d84903164d705cc7786a99de5bf`
MD5	`8956e51f9727926b074c0339c888e864`
BLAKE2b-256	`52ec6fa8fe24b0a71142542d8eab107e576f46e398d01e3dfd7062c611eff8af`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgptracker-0.1.3-py3-none-any.whl:

Publisher: publish.yml on kiuone/PGPTracker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pgptracker-0.1.3-py3-none-any.whl
- Subject digest: 296dd9534021edc9035867981af3536cfec71d84903164d705cc7786a99de5bf
- Sigstore transparency entry: 708073053
- Sigstore integration time: Nov 19, 2025
Source repository:
- Permalink: kiuone/PGPTracker@e71144e53de0c889afba3e771006c82ac0a3d60e
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/kiuone
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e71144e53de0c889afba3e771006c82ac0a3d60e
- Trigger Event: release

pgptracker 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PGPTracker: A Bioinformatics Pipeline for Functional Prediction and Analysis

Core Workflow

Installation

Step 1: Create and Activate Base Environment

Step 2: Install PGPTracker

Step 3: Run Internal Setup (Mandatory)

Quick Start: A Full Example

Step 1: Run Stage 1 (process)

Step 2: Run Stage 2 (analysis)

Command Reference

Main Commands

pgptracker process (Stage 1) Arguments

pgptracker analysis (Stage 2) Arguments

Example Workflows (Stage 2 Analysis Cookbook)

A. Classification: Predict Environmental Biome

B. Regression: Correlate with Chemistry (pH)

Outputs

Citing

PGPTracker & PLaBAse

Core Dependencies

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Step 1: Run Stage 1 (`process`)

Step 2: Run Stage 2 (`analysis`)

`pgptracker process` (Stage 1) Arguments

`pgptracker analysis` (Stage 2) Arguments