A Python package for handling and processing drug screening data in HDF5 format
Project description
DS5
A Python package for drug sensitivity screening data analysis. DS5 handles the full pipeline from raw plate-reader data to drug sensitivity metrics (IC50, EC50, Emax, DSS) with built-in quality control, DMSO normalization, and reporting.
Installation
# From the project root
pip install -e .
# With dev dependencies (pytest, jupyter)
pip install -e ".[dev]"
Requires Python 3.11–3.12.
Quick start
import DS5
# 1. Create a new HDF5 file
DS5.gen_new_HDF5("experiment.h5")
# 2. Load plate-reader data from Excel
DS5.load_excel_to_h5(
"experiment.h5",
well_read_file_name="plate_reads.xlsx",
well_read_sheet_name="Sheet1",
plate_map_file_name="plate_map.xlsx",
plate_map_sheet_name="Sheet1",
patient_id="HCI001",
test_id="set1",
)
# 3. Preprocess (outlier removal)
DS5.preprocess_data("experiment.h5")
# 4. Analyze a single drug
ic50 = DS5.analyze_drug_ic50("experiment.h5", "HCI001", "set1", "Doxorubicin")
print(f"IC50 = {ic50['ic50']['value']}")
# 5. Summarize all drugs in one table
summary = DS5.summarize_test_results("experiment.h5", "HCI001", "set1")
print(summary)
# 6. Batch process and cache results
DS5.process_ds5("experiment.h5")
# 7. Extract data for custom analysis
df = DS5.get_data("experiment.h5", "HCI001_set1", data_type="normalized")
API overview
Data I/O
| Function | Description |
|---|---|
gen_new_HDF5(file_name) |
Create empty DS5-format HDF5 file |
load_excel_to_h5(...) |
Load plate-reader Excel + plate map into HDF5 |
export_h5_to_excel(h5, output) |
Export HDF5 contents to Excel workbook |
load_GDSC_to_h5(csv, ...) |
Load GDSC-format CSV into HDF5 |
load_all_GDSC_to_h5(csv, ...) |
Batch-load all experiments from GDSC CSV |
generate_GDSC_screen_list(csv, ...) |
List available screens in a GDSC CSV |
get_data(h5, screen, data_type) |
Extract data as DataFrame (intensity, normalized, etc.) |
Preprocessing & QC
| Function | Description |
|---|---|
preprocess_data(h5, qc_para_file=None) |
Apply outlier removal to all screens |
check_preprocess(h5, patient, test, drug) |
Visualize preprocessing effect on a drug |
QC_visual(h5, screen, qc_para_file) |
Generate before/after QC comparison plots |
Drug analysis
| Function | Description |
|---|---|
analyze_dmso_controls(h5, patient, test) |
DMSO control statistics and boxplot |
analyze_all_dmso(h5, patient=None) |
DMSO analysis across all screens |
analyze_drug_ic50(h5, patient, test, drug) |
IC50 via 4-parameter logistic fit |
analyze_drug_ec50(h5, patient, test, drug) |
EC50 (50% absolute inhibition) |
analyze_drug_emax(h5, patient, test, drug, mode) |
Maximum inhibition (supports multiple Emax modes) |
calculate_DSS(h5, patient, test, drug) |
DSS1, DSS2, DSS3 drug sensitivity scores |
Emax modes
The mode (or emax_mode) parameter controls how Emax is computed. All functions that compute Emax support these modes:
| Mode | Definition | Requires curve fit |
|---|---|---|
observed_best (default) |
Highest mean inhibition at any tested concentration | No |
observed_highest_dose |
Mean inhibition at the highest tested concentration | No |
fitted_highest_dose |
4PL model-predicted response at the highest tested concentration | Yes (falls back to observed_best) |
e_inf |
Fitted 4PL asymptote, must be in [-10, 200]% | Yes (falls back to observed_best) |
# Single drug analysis with Emax mode
emax = DS5.analyze_drug_emax("experiment.h5", "HCI001", "set1", "Doxorubicin", mode="e_inf")
# Batch processing with Emax mode
DS5.process_ds5("experiment.h5", emax_mode="fitted_highest_dose")
# Summary and comparison with Emax mode
summary = DS5.summarize_test_results("experiment.h5", "HCI001", "set1", emax_mode="e_inf")
comparison = DS5.compare_metrics("experiment.h5", emax_mode="e_inf")
DSS2 always uses the fitted Emax from the 4PL curve regardless of emax_mode.
Summary & comparison
| Function | Description |
|---|---|
summarize_test_results(h5, patient, test, emax_mode) |
All metrics for all drugs in one DataFrame |
process_ds5(input_h5, output_h5=None, emax_mode) |
Batch-process and cache summary tables |
compare_metrics(h5, patient=None, emax_mode) |
Cross-screen metric comparison |
generate_report(h5, test_name) |
HTML report with heatmaps and top drug picks |
Drug name standardization
| Function | Description |
|---|---|
standardize_drug_name(name) |
Resolve via RxNorm/PubChem → rx:12345, pc:6789, or raw:name |
register_metric(name, func) |
Register an external metric plugin |
HDF5 schema
DS5 stores all data in a single HDF5 file. See docs/HDF5_SCHEMA.md for full details.
/patients/
/{patient_id}/
/{test_id}/
data # Raw plate-reader values (byte-string array)
plate_map # Well identifiers: "DrugName concentration" or "DMSO"
preprocessed_data # (optional) Float array with outliers set to NaN
summary_table # (optional) Cached metric summary from process_ds5
/drug_standardization_table # (optional) Maps raw drug names ↔ rx:/pc: IDs
Plate map format
The plate map Excel file should have row labels (A, B, C, ...) and column labels (1, 2, 3, ...) matching standard microplate layout. Each cell contains either:
DMSO— marks a DMSO control wellDrugName concentration— e.g.,Doxorubicin 0.1(drug name, space, concentration in µM)
QC configuration
Preprocessing is controlled by a QC_para.txt file with key=value pairs:
# QC_para.txt example
left_percentile = 1
right_percentile = 99
dmso_use_mad = true
drug_outlier_threshold = 5
| Parameter | Default | Description |
|---|---|---|
left_percentile |
0 | Lower percentile cutoff for global outlier removal |
right_percentile |
0 | Upper percentile cutoff for global outlier removal |
dmso_use_mad |
true | Use MAD-based (true) or IQR-based (false) DMSO outlier removal |
drug_outlier_threshold |
5 | Median-ratio threshold for per-drug outlier removal |
If no QC file is provided, defaults are used (minimal filtering).
External metrics plugin
You can extend DS5 with custom metrics:
from DS5 import register_metric
def compute_my_metric(h5_file_name, patient_id, test_id, drug_name, **kwargs):
"""Must return a dict of {column_name: value}."""
# ... your computation ...
return {
"MY_SCORE": 42.0,
"__meta__": {"prefer_higher": True}, # optional: controls ranking direction
}
register_metric("my_metric", compute_my_metric)
# Now use it in summarize_test_results
summary = DS5.summarize_test_results(
"experiment.h5", "HCI001", "set1",
use_external_metrics=True,
external_metrics=["my_metric"],
)
# summary DataFrame will include a MY_SCORE column
See external_metrics/calculate_metric_max_viability.py for a complete example.
Running tests
pytest tests/ -v -m "not network"
Test data lives in tests/fixtures/ — synthetic.h5 contains a 9x6 plate with 3 drugs at 5 concentrations + DMSO controls. Expected metric outputs are recorded in golden_values.json. Tests use 10% relative tolerance so minor algorithm improvements pass but large regressions fail.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ds5-0.1.0.tar.gz.
File metadata
- Download URL: ds5-0.1.0.tar.gz
- Upload date:
- Size: 98.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa5d24895d4fcac1393c5d4cb5b34127aaeaeeecaf067dcde38d660b7148d29a
|
|
| MD5 |
99dba95e2e04b78a18bfe58704fef8e9
|
|
| BLAKE2b-256 |
cadef7baec594ce332a005e894855ba514c2855f2d812da9caa6809fb6d96833
|
File details
Details for the file ds5-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ds5-0.1.0-py3-none-any.whl
- Upload date:
- Size: 98.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd722081903f94cd43dfc86a1f327c93c4448af25ac44fedeb238128e034bd89
|
|
| MD5 |
be9e5616e682ee225f4bc6f81e95d332
|
|
| BLAKE2b-256 |
770306bcf2fbb85c9b4ecc5e3f92d670caba82cd872e86058bcae4f5b953e97a
|