Droplet Digital PCR Multiplex Analysis for chromosomal copy number detection
Project description
ddQuint: Digital Droplet PCR Quintuplex Analysis
A comprehensive pipeline for analyzing digital droplet PCR (ddPCR) data for aneuploidy detection.
Key Features
- QX Manager: Uses QX Manager Amplitude export files (folder selection)
- Clustering: HDBSCAN-based clustering for droplet classification (adjust expected centroids)
- Copy Number Analysis: Relative and absolute copy number calculations with normalization
- Aneuploidy Detection: Automated detection of chromosomal gains and losses
- Buffer Zone Detection: Identification of samples with uncertain copy number states
- Visualization: Individual well plots and composite plate overview images
- Output format: Results saved as Excel file with sample description, copy numbers and classification
- Sample Naming: Automatically detects QX Manager template files and names samples accordingly
- Plate Template Generation: Create QX Manager-compatible plate layout files from a sample list
Installation
Using pip
# Clone the repository
git clone https://github.com/globuzzz2000/ddQuint
cd ddQuint
# Install the package with all dependencies
pip install -e .
Quick Start
Command Line Usage
# Basic analysis
ddquint --dir /path/to/csv/files
Interactive Mode
Simply run ddquint without arguments to launch the interactive mode with GUI file selection.
Project Structure
ddQuint/
├── ddquint/ # Main package directory
│ ├── __init__.py
│ ├── main.py # Main entry point
│ ├── config/ # Configuration and settings
│ │ ├── __init__.py
│ │ ├── config.py # Core configuration settings
│ │ ├── exceptions.py # Error handling
│ │ ├── config_display.py # Configuration display
│ │ └── template_generator.py # Configuration template generation
│ ├── core/ # Core processing modules
│ │ ├── __init__.py
│ │ ├── clustering.py # HDBSCAN clustering and analysis
│ │ ├── copy_number.py # Copy number calculations
│ │ ├── file_processor.py # CSV file processing
│ │ └── list_report.py # Excel report formatting
│ ├── utils/ # Utility functions
│ │ ├── __init__.py
│ │ ├── file_io.py # File input/output utilities
│ │ ├── gui.py # GUI file selection
│ │ ├── template_parser.py # Template CSV parsing
│ │ ├── template_creator.py # QX template generation
│ │ └── well_utils.py # Well coordinate utilities
│ └── visualization/ # Visualization modules
│ ├── __init__.py
│ ├── plate_plots.py # Plate overview plots
│ └── well_plots.py # Individual well plots
├── pyproject.toml # Package configuration and dependencies
└── README.md
Workflow Overview
- File Selection: Choose directory containing CSV files (GUI or command line)
- Template Processing: Parse sample names from template files (if available)
- Clustering Analysis: Apply HDBSCAN clustering to identify droplet populations
- Target Assignment: Match clusters to expected chromosome centroids
- Copy Number Calculation: Calculate relative and absolute copy numbers
- State Classification: Classify as euploid, aneuploidy, or buffer zone
- Visualization: Generate individual well plots and composite plate image
- Report Generation: Create Excel report
Configuration
Customize the pipeline behavior with a JSON configuration file:
ddquint --config config.json
Example configuration:
{
"HDBSCAN_MIN_CLUSTER_SIZE": 4,
"HDBSCAN_MIN_SAMPLES": 70,
"HDBSCAN_EPSILON": 0.06,
"HDBSCAN_METRIC": "euclidean",
"HDBSCAN_CLUSTER_SELECTION_METHOD": "eom",
"MIN_POINTS_FOR_CLUSTERING": 50,
"EXPECTED_CENTROIDS": "{'Negative': [1000, 800], 'Chrom1': [1000, 2500], 'Chrom2': [1900, 2300], 'Chrom3': [2700, 1850], 'Chrom4': [3300, 1400], 'Chrom5': [3600, 900]"
}
Parameter Editor
Use the parameter editor for configuring frequently modified settings (HDBSCAN Settings, Expected Centroids, ...):
# Launch parameter editor GUI
ddquint --parameters
Parameter Priority Order:
- User parameters file (highest priority)
- Config file specified with
--config - Default config.py values (lowest priority)
The parameter editor automatically loads on startup and provides tooltips with detailed explanations and optimization tips for each setting.
Configuration Management
# View current configuration
ddquint --config
# Generate a configuration template
ddquint --config template
# Use custom configuration
ddquint --config my_config.json --dir /path/to/csv/files
Poisson Correction
The Mixed-Target Problem
In multiplex ddPCR assays, each droplet can contain DNA from multiple targets (e.g., 5 chromosomes). The detection system has a critical limitation:
- Can detect: Droplets positive for only one target type (even if multiple copies)
- Can detect: Empty droplets (no targets)
- Cannot detect: Droplets containing multiple different target types
This means droplets with mixed targets are undetectable and uncountable. Without proper Poisson correction, this leads to systematic underestimation of true concentrations and incorrect target ratios.
For targets, each with true concentration λᵢ copies per droplet, target copy numbers follow Poisson distributions.
Key Probabilities
Empty Droplets
$P(\text{empty}) = e^{-(\lambda_1 + \lambda_2 + \lambda_3 + \lambda_4 + \lambda_5)}$
Single-Target Droplets
Droplets containing only target i (but possibly multiple copies of i):
$P(\text{only } i) = \left(1 - e^{-\lambda_i}\right) \cdot \prod_{j \neq i} e^{-\lambda_j}$
Where:
- $\left(1 - e^{-\lambda_i}\right)$ = probability of ≥1 copy of target i
- $\prod_{j \neq i} e^{-\lambda_j}$ = probability of 0 copies of all other targets
The Solution
Taking the Ratio
$\frac{P(\text{only } i)}{P(\text{empty})} = \frac{\left(1 - e^{-\lambda_i}\right) \cdot \prod_{j \neq i} e^{-\lambda_j}}{\prod_{j=1}^5 e^{-\lambda_j}}$
Simplification
$\frac{P(\text{only } i)}{P(\text{empty})} = \frac{1 - e^{-\lambda_i}}{e^{-\lambda_i}} = e^{\lambda_i} - 1$
$\lambda_i = \ln\left(1 + \frac{P(\text{only } i)}{P(\text{empty})}\right)$
This allows direct calculation of true target concentrations from observed exclusive counts and empty droplets, accounting for all undetectable mixed-target droplets.
Copy Number Classification and Buffer Zones
The pipeline uses a three-state classification system for copy number analysis:
Classification States
-
Euploid: Normal copy number
expected_value ± EUPLOID_TOLERANCE -
Aneuploidy: Clear chromosomal gain or loss
(expected_value + (ANEUPLOIDY_TARGETS - 1.0)) ± ANEUPLOIDY_TOLERANCE -
Buffer Zone: Uncertain intermediate values that don't clearly fit euploid or aneuploidy categories, likely technical artifact
Copy Number Normalization
Normalization algorithm:
- Calculate median of all chromosome copy numbers
- Identify chromosomes close to median (within deviation threshold)
- Use mean of close values as baseline for normalization
- Apply baseline to calculate relative copy numbers
Additional Utilities
QX Manager Template
Generate QX Manager compatible template file from sample list:
CSV/Excel format: 1 to 4-column table with Sample Descriptions
# Generate Template file
ddprimer --QXtemplate
Automatic Sample Naming
Automatically searches for QX template files to map well positions to sample names:
- Requires matching name between input folder and template file
- Searches in parent directories (configurable depth)
- Extracts sample names from "Sample description" columns
Alternatively provide QX template file location for sample naming:
ddprimer --template /path/to/csv
Troubleshooting
Common issues and solutions:
- Incorrect target assignment: Adjust
EXPECTED_CENTROIDSandBASE_TARGET_TOLERANCE - Clustering failures: Adjust
MIN_POINTS_FOR_CLUSTERINGor HDBSCAN parameters - No CSV files found: Ensure files have
.csvextension and contain amplitude data - Missing sample names: Check template file format and location
For more detailed output, run ddprimer --debug or check the logs in ~/.ddQuint/logs/.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ddquint-0.1.0.tar.gz.
File metadata
- Download URL: ddquint-0.1.0.tar.gz
- Upload date:
- Size: 72.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4a9d5e0f5d53edc860a6e0516a23894bd13765b5a7a5150975d97574f029241
|
|
| MD5 |
109ff5ef1adf3654af0732f89a225857
|
|
| BLAKE2b-256 |
9be9d08d05f8f1264320b5ef8191a076179c83af6e24242325e06e8ec68f5fa2
|
File details
Details for the file ddquint-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ddquint-0.1.0-py3-none-any.whl
- Upload date:
- Size: 81.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35e737bf90d6cc79c0e493c01bb7787527df4095199f65e17500c21c0d0eb2f4
|
|
| MD5 |
5b016244efe7e3f316f22d33d2cad9d0
|
|
| BLAKE2b-256 |
47fc309c03a061387980430094ec047be0d94eb345ac69ee8007930937ff775e
|