Skip to main content

Droplet Digital PCR Multiplex Analysis for chromosomal copy number detection

Project description

ddQuint: Digital Droplet PCR Quintuplex Analysis

License: MIT Python 3.10+

A comprehensive pipeline for analyzing digital droplet PCR (ddPCR) data for aneuploidy detection.

Key Features

  • QX Manager: Uses QX Manager Amplitude export files (folder selection)
  • Clustering: HDBSCAN-based clustering for droplet classification (adjust expected centroids)
  • Copy Number Analysis: Relative and absolute copy number calculations with normalization
  • Aneuploidy Detection: Automated detection of chromosomal gains and losses
  • Buffer Zone Detection: Identification of samples with uncertain copy number states
  • Visualization: Individual well plots and composite plate overview images
  • Output format: Results saved as Excel file with sample description, copy numbers and classification
  • Sample Naming: Automatically detects QX Manager template files and names samples accordingly
  • Plate Template Generation: Create QX Manager-compatible plate layout files from a sample list

Installation

Using pip

# Clone the repository
git clone https://github.com/globuzzz2000/ddQuint
cd ddQuint

# Install the package with all dependencies
pip install -e .

Quick Start

Command Line Usage

# Basic analysis
ddquint --dir /path/to/csv/files

Interactive Mode

Simply run ddquint without arguments to launch the interactive mode with GUI file selection.

Project Structure

ddQuint/
├── ddquint/                       # Main package directory
│   ├── __init__.py
│   ├── main.py                    # Main entry point
│   ├── config/                    # Configuration and settings
│   │   ├── __init__.py
│   │   ├── config.py              # Core configuration settings
│   │   ├── exceptions.py          # Error handling
│   │   ├── config_display.py      # Configuration display
│   │   └── template_generator.py  # Configuration template generation
│   ├── core/                      # Core processing modules
│   │   ├── __init__.py
│   │   ├── clustering.py          # HDBSCAN clustering and analysis
│   │   ├── copy_number.py         # Copy number calculations
│   │   ├── file_processor.py      # CSV file processing
│   │   └── list_report.py         # Excel report formatting
│   ├── utils/                     # Utility functions
│   │   ├── __init__.py
│   │   ├── file_io.py             # File input/output utilities
│   │   ├── gui.py                 # GUI file selection
│   │   ├── template_parser.py     # Template CSV parsing
│   │   ├── template_creator.py    # QX template generation
│   │   └── well_utils.py          # Well coordinate utilities
│   └── visualization/             # Visualization modules
│       ├── __init__.py
│       ├── plate_plots.py         # Plate overview plots
│       └── well_plots.py          # Individual well plots
├── pyproject.toml                 # Package configuration and dependencies
└── README.md             

Workflow Overview

  1. File Selection: Choose directory containing CSV files (GUI or command line)
  2. Template Processing: Parse sample names from template files (if available)
  3. Clustering Analysis: Apply HDBSCAN clustering to identify droplet populations
  4. Target Assignment: Match clusters to expected chromosome centroids
  5. Copy Number Calculation: Calculate relative and absolute copy numbers
  6. State Classification: Classify as euploid, aneuploidy, or buffer zone
  7. Visualization: Generate individual well plots and composite plate image
  8. Report Generation: Create Excel report

Configuration

Customize the pipeline behavior with a JSON configuration file:

ddquint --config config.json

Example configuration:

{
"HDBSCAN_MIN_CLUSTER_SIZE": 4,
"HDBSCAN_MIN_SAMPLES": 70,
"HDBSCAN_EPSILON": 0.06,
"HDBSCAN_METRIC": "euclidean",
"HDBSCAN_CLUSTER_SELECTION_METHOD": "eom",
"MIN_POINTS_FOR_CLUSTERING": 50,
"EXPECTED_CENTROIDS": "{'Negative': [1000, 800], 'Chrom1': [1000, 2500], 'Chrom2': [1900, 2300], 'Chrom3': [2700, 1850], 'Chrom4': [3300, 1400], 'Chrom5': [3600, 900]"
}

Parameter Editor

Use the parameter editor for configuring frequently modified settings (HDBSCAN Settings, Expected Centroids, ...):

# Launch parameter editor GUI
ddquint --parameters

Parameter Priority Order:

  1. User parameters file (highest priority)
  2. Config file specified with --config
  3. Default config.py values (lowest priority)

The parameter editor automatically loads on startup and provides tooltips with detailed explanations and optimization tips for each setting.

Configuration Management

# View current configuration
ddquint --config

# Generate a configuration template
ddquint --config template

# Use custom configuration
ddquint --config my_config.json --dir /path/to/csv/files

Poisson Correction

The Mixed-Target Problem

In multiplex ddPCR assays, each droplet can contain DNA from multiple targets (e.g., 5 chromosomes). The detection system has a critical limitation:

  • Can detect: Droplets positive for only one target type (even if multiple copies)
  • Can detect: Empty droplets (no targets)
  • Cannot detect: Droplets containing multiple different target types

This means droplets with mixed targets are undetectable and uncountable. Without proper Poisson correction, this leads to systematic underestimation of true concentrations and incorrect target ratios.

For targets, each with true concentration λᵢ copies per droplet, target copy numbers follow Poisson distributions.

Key Probabilities

Empty Droplets

$P(\text{empty}) = e^{-(\lambda_1 + \lambda_2 + \lambda_3 + \lambda_4 + \lambda_5)}$

Single-Target Droplets

Droplets containing only target i (but possibly multiple copies of i):

$P(\text{only } i) = \left(1 - e^{-\lambda_i}\right) \cdot \prod_{j \neq i} e^{-\lambda_j}$

Where:

  • $\left(1 - e^{-\lambda_i}\right)$ = probability of ≥1 copy of target i
  • $\prod_{j \neq i} e^{-\lambda_j}$ = probability of 0 copies of all other targets

The Solution

Taking the Ratio

$\frac{P(\text{only } i)}{P(\text{empty})} = \frac{\left(1 - e^{-\lambda_i}\right) \cdot \prod_{j \neq i} e^{-\lambda_j}}{\prod_{j=1}^5 e^{-\lambda_j}}$

Simplification

$\frac{P(\text{only } i)}{P(\text{empty})} = \frac{1 - e^{-\lambda_i}}{e^{-\lambda_i}} = e^{\lambda_i} - 1$

$\lambda_i = \ln\left(1 + \frac{P(\text{only } i)}{P(\text{empty})}\right)$

This allows direct calculation of true target concentrations from observed exclusive counts and empty droplets, accounting for all undetectable mixed-target droplets.

Copy Number Classification and Buffer Zones

The pipeline uses a three-state classification system for copy number analysis:

Classification States

  1. Euploid: Normal copy number

    expected_value ± EUPLOID_TOLERANCE

  2. Aneuploidy: Clear chromosomal gain or loss

    (expected_value + (ANEUPLOIDY_TARGETS - 1.0)) ± ANEUPLOIDY_TOLERANCE

  3. Buffer Zone: Uncertain intermediate values that don't clearly fit euploid or aneuploidy categories, likely technical artifact

Copy Number Normalization

Normalization algorithm:

  1. Calculate median of all chromosome copy numbers
  2. Identify chromosomes close to median (within deviation threshold)
  3. Use mean of close values as baseline for normalization
  4. Apply baseline to calculate relative copy numbers

Additional Utilities

QX Manager Template

Generate QX Manager compatible template file from sample list:

CSV/Excel format: 1 to 4-column table with Sample Descriptions

# Generate Template file
ddprimer --QXtemplate

Automatic Sample Naming

Automatically searches for QX template files to map well positions to sample names:

  • Requires matching name between input folder and template file
  • Searches in parent directories (configurable depth)
  • Extracts sample names from "Sample description" columns

Alternatively provide QX template file location for sample naming:

ddprimer --template /path/to/csv

Troubleshooting

Common issues and solutions:

  • Incorrect target assignment: Adjust EXPECTED_CENTROIDS and BASE_TARGET_TOLERANCE
  • Clustering failures: Adjust MIN_POINTS_FOR_CLUSTERING or HDBSCAN parameters
  • No CSV files found: Ensure files have .csv extension and contain amplitude data
  • Missing sample names: Check template file format and location

For more detailed output, run ddprimer --debug or check the logs in ~/.ddQuint/logs/.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddquint-0.1.0.tar.gz (72.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddquint-0.1.0-py3-none-any.whl (81.7 kB view details)

Uploaded Python 3

File details

Details for the file ddquint-0.1.0.tar.gz.

File metadata

  • Download URL: ddquint-0.1.0.tar.gz
  • Upload date:
  • Size: 72.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for ddquint-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b4a9d5e0f5d53edc860a6e0516a23894bd13765b5a7a5150975d97574f029241
MD5 109ff5ef1adf3654af0732f89a225857
BLAKE2b-256 9be9d08d05f8f1264320b5ef8191a076179c83af6e24242325e06e8ec68f5fa2

See more details on using hashes here.

File details

Details for the file ddquint-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ddquint-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 81.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for ddquint-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35e737bf90d6cc79c0e493c01bb7787527df4095199f65e17500c21c0d0eb2f4
MD5 5b016244efe7e3f316f22d33d2cad9d0
BLAKE2b-256 47fc309c03a061387980430094ec047be0d94eb345ac69ee8007930937ff775e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page