Skip to main content

A Python tool for panresistome analysis

Project description

PanR2: Panresistome Analysis Tool

Overview

PanR2 is a comprehensive Python-based tool for analyzing panresistome data. It processes NCBI and Abricate summary files, merges the data, and generates a wide range of visualizations including heatmaps, bar plots, boxplots, and interactive HTML plots. The tool is designed to help researchers analyze and visualize antibiotic resistance gene presence, prevalence, and distribution patterns across different geographic locations and temporal scales.

Prerequisites:

  • ncbi_clean.csv from FetchM
  • Summary files in .tab (preferred) or .csv format from Abricate

Key Features:

  • Merges and analyzes NCBI and Abricate outputs
  • Calculates gene presence/absence across multiple categories (continent, location, subcontinent, collection date)
  • Performs comprehensive statistical tests and correlation analyses
  • Generates multiple visualization types: heatmaps, bar plots, boxplots, lollipop plots, and correlation plots
  • Creates interactive HTML visualizations for enhanced data exploration
  • Generates an interactive HTML index for easy navigation of all results
  • Provides detailed statistical analysis outputs

Installation

Method 1: Using pip with conda (Recommended)

conda create -n panr_env python=3.9
conda activate panr_env
pip install panR2

Method 2: Direct installation from GitHub

conda create -n panr_env python=3.8
conda activate panr_env
pip install git+https://github.com/Tasnimul-Arabi-Anik/PanR2.git

Method 3: Manual Installation from Source

git clone https://github.com/Tasnimul-Arabi-Anik/PanR2.git
cd PanR2
pip install -r requirements.txt

Confirm Installation

panr --help

Usage

Command-Line Interface

panr --ncbi-dir <NCBI_DIRECTORY> --abricate-dir <ABRICATE_DIRECTORY> --output-dir <OUTPUT_DIRECTORY> [OPTIONS]

Required Arguments

Argument Description
--ncbi-dir Path to the ncbi_clean.csv file from FetchM
--abricate-dir Directory containing Abricate summary .tab or .csv files
--output-dir Directory to store merged results and visualizations

Optional Arguments

Argument Type Default Description
--genep float - Minimum % gene presence to include in heatmap
--nseq int - Minimum number of sequences required per group in heatmaps
--format str png Output format for figures (tiff, svg, png, pdf)
--version - - Show program's version number and exit
-h, --help - - Show help message and exit

Example Usage

# Basic usage
panr --ncbi-dir ./data/ncbi_clean.csv --abricate-dir ./data/abricate --output-dir ./output

# With optional parameters
panr --ncbi-dir ./data/ncbi_clean.csv --abricate-dir ./data/abricate --output-dir ./output --format pdf --genep 10 --nseq 5

Output Structure

PanR2 generates a comprehensive set of outputs organized in the following directory structure:

output/
├── figures/
│   ├── heatmap/                          # Geographic heatmaps
│   ├── html_files/                       # Interactive HTML plots
│   ├── mean_ARG/                         # Mean antibiotic resistance gene plots
│   ├── Stat_analysis/                    # Statistical analysis results
│   ├── index.html                        # Main navigation page
│   └── [Various static plots]
└── [Processed CSV files]

Output Files Description

1. Static Visualizations

  • Resistance_gene_presence.{format} - Bar plot showing gene presence across samples
  • Resistance_gene_percentage.{format} - Lollipop plot showing gene percentage distribution
  • Resistance_gene_identity_boxplot.{format} - Boxplot of resistance gene identity scores
  • Resistance_percentage_by_Antibiotics.{format} - Bar plot of resistance by antibiotic classes

2. Heatmaps (heatmap/ directory)

  • ncbi_ncbi_Continent_heatmap.{format} - Resistance gene distribution by continent
  • ncbi_ncbi_Geographic_Location_heatmap.{format} - Distribution by geographic location
  • ncbi_ncbi_Subcontinent_heatmap.{format} - Distribution by subcontinent
  • ncbi_ncbi_Collection_Date_heatmap.{format} - Temporal distribution patterns

3. Mean ARG Analysis (mean_ARG/ directory)

  • Mean_ARG_by_Continent.{format} - Average antibiotic resistance genes by continent
  • Mean_ARG_by_Geographic Location.{format} - Average ARGs by geographic location
  • Mean_ARG_by_Subcontinent.{format} - Average ARGs by subcontinent
  • Mean_ARG_by_Collection Date.{format} - Temporal trends in ARG abundance

4. Interactive HTML Visualizations (html_files/ directory)

  • Resistance_gene_distribution_heatmap.html - Interactive heatmap of gene distribution
  • Resistance_gene_geographic_distribution.html - Geographic distribution map
  • Resistance_gene_frequency_boxplot.html - Interactive frequency analysis
  • Resistance_gene_identity_boxplot.html - Interactive identity score analysis
  • Resistance_gene_presence.html - Interactive presence/absence visualization
  • Resistance_gene_percentage.html - Interactive percentage analysis
  • Resistance_percentage_by_Antibiotics.html - Interactive antibiotic class analysis
  • Mean_Frequency_Antibiotic_Resistance_genes.html - Mean frequency analysis
  • Continent_correlation_plot.html - Continental correlation analysis
  • Geographic_Location_correlation_plot.html - Location-based correlations
  • Subcontinent_correlation_plot.html - Subcontinental correlation patterns

5. Statistical Analysis (Stat_analysis/ directory)

  • combined_geographic_correlation_summary.csv - Geographic correlation statistics
  • combined_overall_tests.csv - Overall statistical test results
  • combined_pairwise_comparisons.csv - Pairwise comparison results
  • combined_summary_statistics.csv - Comprehensive summary statistics
  • ncbi_gene_presence_count_percentage.csv - Gene presence counts and percentages

6. Navigation

  • index.html - Interactive HTML index page for easy navigation of all generated visualizations

Example Visualizations

Static Plots

Resistance Gene Presence Bar plot showing the presence of resistance genes across samples

Resistance Gene Percentage Lollipop plot displaying gene percentage distribution

Geographic Heatmap Heatmap showing resistance gene distribution across continents

Interactive Features

The tool generates interactive HTML visualizations that allow for:

  • Zooming and panning
  • Hover tooltips with detailed information
  • Dynamic filtering and selection
  • Exportable high-quality images
  • Responsive design for different screen sizes

Access all interactive plots through the generated index.html file in your output directory.


Statistical Analysis Features

PanR2 provides comprehensive statistical analysis including:

  • Correlation Analysis: Geographic and temporal correlations
  • Comparative Statistics: Between-group comparisons
  • Summary Statistics: Descriptive statistics for all variables
  • Pairwise Comparisons: Detailed pairwise statistical tests
  • Geographic Patterns: Spatial distribution analysis

Requirements

  • Python 3.8+
  • Required Python packages (automatically installed):
    • pandas
    • numpy
    • matplotlib
    • seaborn
    • plotly
    • scipy
    • Other dependencies listed in requirements.txt

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for bugs, feature requests, or improvements.


License

This tool is provided under the MIT License. See LICENSE file for details.


Citation

If you use PanR2 in your research, please cite: DOI: 10.1101/2025.04.08.647722

PanR2: A comprehensive tool for panresistome analysis and visualization
Author: Tasnimul Arabi Anik
GitHub: https://github.com/Tasnimul-Arabi-Anik/PanR2

Support

For questions, issues, or feature requests, please:

  1. Check the existing Issues
  2. Create a new issue with detailed information
  3. Contact the author: Tasnimul Arabi Anik

Changelog

Latest Updates

  • Added interactive HTML visualizations
  • Enhanced statistical analysis capabilities
  • Improved output organization and navigation
  • Added support for multiple figure formats
  • Enhanced correlation analysis features

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panr2-0.1.1.tar.gz (25.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panr2-0.1.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file panr2-0.1.1.tar.gz.

File metadata

  • Download URL: panr2-0.1.1.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for panr2-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6332b05dea643974f01c267c14366a9e8656dbd7a17a5c6b78ba042224b12c9f
MD5 e4526f8b3b1ffda687c1c62b58380f8b
BLAKE2b-256 e04f50d79314f71b715809cb47bc920ef4583c0d91c53960a7b2aff6273be4ff

See more details on using hashes here.

File details

Details for the file panr2-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: panr2-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for panr2-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 377a5112687047a859e461ade9f6208ed4e98442475a232fd4a4824cd781b1ee
MD5 28d1cfe72da2e86137f5bce3306b8ac3
BLAKE2b-256 cadc3d815b61c5e184e0eada7a30cd02ed55c93f1f12d0dc58fbf44cc85826df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page