A comprehensive LC-MS metabolomics data quality control module.

These details have not been verified by PyPI

Project links

Project description

`pi-metaboqc`: $\pi$-Metabolomics-Quality Control

pi-metaboqc is a high-performance, fully automated data quality control pipeline designed specifically for large-scale, multi-batch clinical metabolomics.

✨ Core Capabilities

Pure Python Ecosystem & Native Pandas Integration: The core data structure, MetaboInt, natively inherits from pandas.DataFrame. All underlying calculations are strictly implemented using industry-standard libraries like SciPy and scikit-learn. Furthermore, classical methods that traditionally relied on R (such as Quantile Normalization and VSN) have been completely reconstructed in Python, achieving statistically equivalent results and breaking down language barriers.
Intelligent Missing Value Management: Built-in heuristic algorithms automatically identify and distinguish between MAR (Missing at Random) and MNAR (Missing Not at Random) metabolite features. By evaluating statistical metrics like NRMSE (Normalized Root Mean Square Error), the pipeline auto-tunes and selects the most appropriate filtering and imputation strategies for your specific dataset.
Dual-Engine High-Performance Computing: Powered by a synergistic integration of joblib for multi-core parallelization and Numba for Just-In-Time (JIT) compilation. This architecture effortlessly accelerates computationally intensive tasks—such as baseline modeling and cross-validation—to near-C speeds, drastically reducing turnaround times for massive clinical cohorts.
End-to-End Quality Assessment (QA): Provides comprehensive data evaluation functions spanning the entire pipeline. From raw data import and missing value handling to signal drift correction and normalization, the distribution and quality of your data are clearly monitored and controllable at every single step.
Dual-Tier Automated Reporting & Publication-Ready Visualizations: The pipeline silently captures critical retention metrics and statistical parameters across all stages, offering users the flexibility to generate either Brief (executive summary) or Comprehensive (deep-dive audit) PDF/Markdown reports with a single click. Furthermore, all diagnostic plots are natively exported in lossless SVG format, ensuring they are instantly ready for high-fidelity editing in Adobe Illustrator or Microsoft PowerPoint for journal submission.

📦 Installation

We strongly recommend installing pi-metaboqc within a Conda virtual environment using Miniforge (preferred), Miniconda, or Anaconda.

Generating high-fidelity HTML and PDF reports requires advanced graphical engines (pandoc, weasyprint, and librsvg). These tools depend on complex, system-level C libraries (e.g., GTK3, Pango) that are notoriously difficult to compile and configure via standard pip, particularly on Windows.

Conda effortlessly resolves these low-level dependencies. To guarantee maximum stability across all operating systems, please follow the Standard Installation guide below.

⚠️ Note: While we have integrated an automatic fallback download feature for missing dependencies, it has not been exhaustively tested across all edge cases. Proceeding with the Conda installation remains the most robust and officially supported approach.

Step 1: Create and Activate Conda Environment

conda create -n metaboqc python=3.13 pip -y
conda activate metaboqc

Step 2: Pre-install Graphical Engines (Recommended)

Install pandoc, weasyprint and librsvg via conda-forge to ensure all necessary system graphical libraries are correctly linked before installing the Python package:

conda install -c conda-forge pandoc weasyprint librsvg -y

Step 3: Install `pi-metaboqc`

For standard users: Install the stable release directly from PyPI:

pip install pi-metaboqc

Alternatively, install the latest development version directly from GitHub:

pip install git+https://github.com/KaikunXu/pi-metaboqc.git

For developers (Editable mode): If you plan to modify the source code or contribute to the project:

git clone https://github.com/KaikunXu/pi-metaboqc.git
cd pi-metaboqc
pip install -e .

🚀 Quickstart & Tutorials

pi-metaboqc is designed for zero-friction deployment. You only need three files to trigger the fully automated pipeline: a sample metadata table, a raw intensity matrix, and a TOML configuration file.

We provide execution modalities for different use cases in the examples/ directory. For first-time users, we strongly recommend starting with the Interactive Notebook.

1. Interactive Notebook (Recommended for Onboarding)

Interactive Tutorial (interactive_tutorial.ipynb): An end-to-end Jupyter Notebook. This is the optimal way to experience pi-metaboqc. It allows you to step through the pipeline, visually inspect intermediate QA diagnostic dashboards (including model_overview plots with Q2 metrics, natively rendered as high-fidelity SVGs), and intuitively grasp the core algorithmic logic.

Choose the access method that best suits your network environment:

Static Viewer (nbviewer): Delivers fast, static rendering. Recommended for users in mainland China to ensure all inline SVG plots are displayed reliably without execution overhead or connectivity issues.
Google Colab: A cloud-executable environment. Best for global users who wish to run the pipeline dynamically with zero local configuration.

2. Headless CLI Execution (For Production & Batch Processing)

For deployment on HPC clusters or integration into larger bioinformatics workflows, utilize our robust command-line interface script (run_pimqc.py).

# Navigate to the examples directory
cd examples

# Option A: Run out-of-the-box with bundled demo data
python run_pimqc.py

# Option B: Run with your own custom clinical cohort
python run_pimqc.py \
    --meta /path/to/your_meta.csv \
    --intensity /path/to/your_intensity.csv \
    --config /path/to/custom_params.toml \
    --outdir /path/to/output_directory

# Option C: Run in silent mode
python run_pimqc.py -q

⚠️ Troubleshooting Note for VS Code Users: When running the CLI script via the integrated terminal in Visual Studio Code, the IDE may occasionally fail to properly inherit full Conda environment variables. This prevents the PDF rendering engine from locating essential system-level C libraries (e.g., GTK3/Pango), causing the report generation to gracefully degrade and output an HTML report instead.

Resolution: You can bypass this by executing the script from a native system terminal (e.g., Anaconda Prompt, macOS Terminal). Alternatively, to permanently configure VS Code for seamless PDF rendering and resolve PowerShell restrictions, please refer to our VS Code Environment & Troubleshooting Guide.

Automated Refinement Protocol (Under the Hood)

Upon executing the pipeline via either modality, the system strictly follows a rigorous sequential refinement protocol:

Building dataset: Parses TOML or JSON configurations to seamlessly align sample metadata with the raw intensity matrix, instantiating the core MetaboInt data object.
High-missing value features filtering: Heuristically classifies missing value mechanisms (MAR vs. MNAR) and eliminates invalid features exceeding predefined missing rate thresholds.
Intra-batch correction: Corrects inject otder-dependent instrument signal drift within individual analytical batches using pooled QCs-based robust regression models (QC-RLSC, QC-RFSC or QC-SVR).
Inter-batch correction: Harmonizes analytical variations across multiple independent batches, mitigating systemic batch effects to ensure global data comparability.
Low-quality features filtering: Precisely prunes unreliable features based on rigorous noise-filtering criteria, including Blank-to-QC intensity ratios and pooled-QC Relative Standard Deviation (RSD).
Missing values imputation: Executes stratified, mechanism-aware imputation on remaining missing values, either auto-tuned via NRMSE simulation benchmarks or applying user-defined algorithms.
Normalization: Adjusts for systematic sample-to-sample variations (e.g., biofluid dilution effects) using global scaling techniques such as PQN, Median, TIC, VSN and Quantile.
Quality assessment (Replicated): Operates transparently across all pipeline stages, continuously capturing statistical metrics to generate a comprehensive, publication-ready Markdown/PDF audit report.

📂 Project Structure

pi-metaboqc/
├── README.md                      # Project documentation and quickstart guide
├── pyproject.toml                 # Modern Python build and dependency config
├── LICENSE                        # MIT license
├── examples/                      # Directory for tutorials and examples
│   ├── interactive_tutorial.ipynb # Interactive Jupyter Notebook for onboarding
│   └── run_pimqc.py               # Production-ready CLI execution script
├── src/                           # Core source code directory
│   └── pimqc/                     # Core pi-metaboqc package
│       ├── __init__.py            # Package initialization file
│       ├── core_classes.py        # Core DataStructure class (MetaboInt)
│       ├── visualizer_classes.py  # Core Visualization class (BaseMetaboVisualizer)
│       ├── dataset_builder.py     # MetaboInt instantiation 
│       ├── assessment.py          # Data quality assessment
│       ├── correction.py          # Signal drift & batch correction
│       ├── filtering.py           # High-missing & low-quality features filtering
│       ├── imputation.py          # Missing values imputation
│       ├── normalization.py       # Data normalization
│       ├── pipeline.py            # Automated pipeline orchestrator
│       ├── io_utils.py            # I/O operations
│       ├── plot_utils.py          # Plotting utilities
│       ├── pca_utils.py           # Underlying PCA dimensionality reduction
│       ├── stat_utils.py          # Shared statistical utility functions
│       ├── report_utils.py        # Automated markdown and pdf report rendering
│       ├── config_schema.py       # Configuration schema and parameter validation
│       ├── templates/...          # Template file for generating reports...
│       └── data/                  # Demo data and configuration file directory
│           ├── project_meta.csv          # Demo project metadata file
│           ├── project_intensity.csv     # Demo project intensity file
│           └── pipeline_parameters.toml  # Demo pipeline parameters file
│── tests/...                      # Unit testing and E2E stress testing...
└── ...                            # Other files required by this module...

💡 Note on Configuration: The entire analytical logic of pi-metaboqc is centrally governed by pipeline_parameters.toml. Users can fine-tune missing value tolerances, SVR kernel parameters, and normalization strategies exclusively through this file, without modifying any underlying Python code.

🤝 Contributing & License

This project is licensed under the MIT License. Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0a1 pre-release

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pi_metaboqc-1.0.0a1.tar.gz (1.1 MB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pi_metaboqc-1.0.0a1-py3-none-any.whl (1.1 MB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file pi_metaboqc-1.0.0a1.tar.gz.

File metadata

Download URL: pi_metaboqc-1.0.0a1.tar.gz
Upload date: May 19, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pi_metaboqc-1.0.0a1.tar.gz
Algorithm	Hash digest
SHA256	`c7d52ce6ccf26fade0f096ffcfc783984690ddd2e582abdd5e5dec3e9f9ece80`
MD5	`35160d6f8500590987e39802ac274986`
BLAKE2b-256	`42724fc2046e0783360d80a2b45711e45f87255b926d05be53c4b4896ecec9bb`

See more details on using hashes here.

File details

Details for the file pi_metaboqc-1.0.0a1-py3-none-any.whl.

File metadata

Download URL: pi_metaboqc-1.0.0a1-py3-none-any.whl
Upload date: May 19, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pi_metaboqc-1.0.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bd30c266aca732b56fdf3d6120145f5018f2f6306504cb5d83f3a5a8bedd28a`
MD5	`fb14ba79288b90c128159dcc7d34ef4f`
BLAKE2b-256	`46d7dbae05a844a414ca86b899b7162d445b24585c34e5a92b4ff764374b9698`

See more details on using hashes here.

pi-metaboqc 1.0.0a1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

`pi-metaboqc`: $\pi$-Metabolomics-Quality Control

✨ Core Capabilities

📦 Installation

Step 1: Create and Activate Conda Environment

Step 2: Pre-install Graphical Engines (Recommended)

Step 3: Install `pi-metaboqc`

🚀 Quickstart & Tutorials

1. Interactive Notebook (Recommended for Onboarding)

2. Headless CLI Execution (For Production & Batch Processing)

Automated Refinement Protocol (Under the Hood)

📂 Project Structure

🤝 Contributing & License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

pi-metaboqc 1.0.0a1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

pi-metaboqc: $\pi$-Metabolomics-Quality Control

✨ Core Capabilities

📦 Installation

Step 1: Create and Activate Conda Environment

Step 2: Pre-install Graphical Engines (Recommended)

Step 3: Install pi-metaboqc

🚀 Quickstart & Tutorials

1. Interactive Notebook (Recommended for Onboarding)

2. Headless CLI Execution (For Production & Batch Processing)

Automated Refinement Protocol (Under the Hood)

📂 Project Structure

🤝 Contributing & License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`pi-metaboqc`: $\pi$-Metabolomics-Quality Control

Step 3: Install `pi-metaboqc`