A Python package for handling and processing drug screening data in HDF5 format

These details have not been verified by PyPI

Project links

Project description

DS5

A Python package for drug sensitivity screening data analysis. DS5 handles the full pipeline from raw plate-reader data to drug sensitivity metrics (IC50, EC50, Emax, DSS) with built-in quality control, DMSO normalization, and reporting.

Installation

# From the project root
pip install -e .

# With dev dependencies (pytest, jupyter)
pip install -e ".[dev]"

Requires Python 3.11–3.12.

Quick start

import DS5

# 1. Create a new HDF5 file
DS5.gen_new_HDF5("experiment.h5")

# 2. Load plate-reader data from Excel
DS5.load_excel_to_h5(
    "experiment.h5",
    well_read_file_name="plate_reads.xlsx",
    well_read_sheet_name="Sheet1",
    plate_map_file_name="plate_map.xlsx",
    plate_map_sheet_name="Sheet1",
    patient_id="HCI001",
    test_id="set1",
)

# 3. Preprocess (outlier removal)
DS5.preprocess_data("experiment.h5")

# 4. Analyze a single drug
ic50 = DS5.analyze_drug_ic50("experiment.h5", "HCI001", "set1", "Doxorubicin")
print(f"IC50 = {ic50['ic50']['value']}")

# 5. Summarize all drugs in one table
summary = DS5.summarize_test_results("experiment.h5", "HCI001", "set1")
print(summary)

# 6. Batch process and cache results
DS5.process_ds5("experiment.h5")

# 7. Extract data for custom analysis
df = DS5.get_data("experiment.h5", "HCI001_set1", data_type="normalized")

API overview

Data I/O

Function	Description
`gen_new_HDF5(file_name)`	Create empty DS5-format HDF5 file
`load_excel_to_h5(...)`	Load plate-reader Excel + plate map into HDF5
`export_h5_to_excel(h5, output)`	Export HDF5 contents to Excel workbook
`load_GDSC_to_h5(csv, ...)`	Load GDSC-format CSV into HDF5
`load_all_GDSC_to_h5(csv, ...)`	Batch-load all experiments from GDSC CSV
`generate_GDSC_screen_list(csv, ...)`	List available screens in a GDSC CSV
`get_data(h5, screen, data_type)`	Extract data as DataFrame (`intensity`, `normalized`, etc.)

Preprocessing & QC

Function	Description
`preprocess_data(h5, qc_para_file=None)`	Apply outlier removal to all screens
`check_preprocess(h5, patient, test, drug)`	Visualize preprocessing effect on a drug
`QC_visual(h5, screen, qc_para_file)`	Generate before/after QC comparison plots

Drug analysis

Function	Description
`analyze_dmso_controls(h5, patient, test)`	DMSO control statistics and boxplot
`analyze_all_dmso(h5, patient=None)`	DMSO analysis across all screens
`analyze_drug_ic50(h5, patient, test, drug)`	IC50 via 4-parameter logistic fit
`analyze_drug_ec50(h5, patient, test, drug)`	EC50 (50% absolute inhibition)
`analyze_drug_emax(h5, patient, test, drug, mode)`	Maximum inhibition (supports multiple Emax modes)
`calculate_DSS(h5, patient, test, drug)`	DSS1, DSS2, DSS3 drug sensitivity scores

Emax modes

The mode (or emax_mode) parameter controls how Emax is computed. All functions that compute Emax support these modes:

Mode	Definition	Requires curve fit
`observed_best` (default)	Highest mean inhibition at any tested concentration	No
`observed_highest_dose`	Mean inhibition at the highest tested concentration	No
`fitted_highest_dose`	4PL model-predicted response at the highest tested concentration	Yes (falls back to `observed_best`)
`e_inf`	Fitted 4PL asymptote, must be in [-10, 200]%	Yes (falls back to `observed_best`)

# Single drug analysis with Emax mode
emax = DS5.analyze_drug_emax("experiment.h5", "HCI001", "set1", "Doxorubicin", mode="e_inf")

# Batch processing with Emax mode
DS5.process_ds5("experiment.h5", emax_mode="fitted_highest_dose")

# Summary and comparison with Emax mode
summary = DS5.summarize_test_results("experiment.h5", "HCI001", "set1", emax_mode="e_inf")
comparison = DS5.compare_metrics("experiment.h5", emax_mode="e_inf")

DSS2 always uses the fitted Emax from the 4PL curve regardless of emax_mode.

Summary & comparison

Function	Description
`summarize_test_results(h5, patient, test, emax_mode)`	All metrics for all drugs in one DataFrame
`process_ds5(input_h5, output_h5=None, emax_mode)`	Batch-process and cache summary tables
`compare_metrics(h5, patient=None, emax_mode)`	Cross-screen metric comparison
`generate_report(h5, test_name)`	HTML report with heatmaps and top drug picks

Drug name standardization

Function	Description
`standardize_drug_name(name)`	Resolve via RxNorm/PubChem → `rx:12345`, `pc:6789`, or `raw:name`
`register_metric(name, func)`	Register an external metric plugin

HDF5 schema

DS5 stores all data in a single HDF5 file. See docs/HDF5_SCHEMA.md for full details.

/patients/
  /{patient_id}/
    /{test_id}/
      data                 # Raw plate-reader values (byte-string array)
      plate_map            # Well identifiers: "DrugName concentration" or "DMSO"
      preprocessed_data    # (optional) Float array with outliers set to NaN
      summary_table        # (optional) Cached metric summary from process_ds5
/drug_standardization_table  # (optional) Maps raw drug names ↔ rx:/pc: IDs

Plate map format

The plate map Excel file should have row labels (A, B, C, ...) and column labels (1, 2, 3, ...) matching standard microplate layout. Each cell contains either:

DMSO — marks a DMSO control well
DrugName concentration — e.g., Doxorubicin 0.1 (drug name, space, concentration in µM)

QC configuration

Preprocessing is controlled by a QC_para.txt file with key=value pairs:

# QC_para.txt example
left_percentile = 1
right_percentile = 99
dmso_use_mad = true
drug_outlier_threshold = 5

Parameter	Default	Description
`left_percentile`	0	Lower percentile cutoff for global outlier removal
`right_percentile`	0	Upper percentile cutoff for global outlier removal
`dmso_use_mad`	true	Use MAD-based (true) or IQR-based (false) DMSO outlier removal
`drug_outlier_threshold`	5	Median-ratio threshold for per-drug outlier removal

If no QC file is provided, defaults are used (minimal filtering).

External metrics plugin

You can extend DS5 with custom metrics:

from DS5 import register_metric

def compute_my_metric(h5_file_name, patient_id, test_id, drug_name, **kwargs):
    """Must return a dict of {column_name: value}."""
    # ... your computation ...
    return {
        "MY_SCORE": 42.0,
        "__meta__": {"prefer_higher": True},  # optional: controls ranking direction
    }

register_metric("my_metric", compute_my_metric)

# Now use it in summarize_test_results
summary = DS5.summarize_test_results(
    "experiment.h5", "HCI001", "set1",
    use_external_metrics=True,
    external_metrics=["my_metric"],
)
# summary DataFrame will include a MY_SCORE column

See external_metrics/calculate_metric_max_viability.py for a complete example.

Running tests

pytest tests/ -v -m "not network"

Test data lives in tests/fixtures/ — synthetic.h5 contains a 9x6 plate with 3 drugs at 5 concentrations + DMSO controls. Expected metric outputs are recorded in golden_values.json. Tests use 10% relative tolerance so minor algorithm improvements pass but large regressions fail.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds5-0.1.0.tar.gz (98.0 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ds5-0.1.0-py3-none-any.whl (98.6 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file ds5-0.1.0.tar.gz.

File metadata

Download URL: ds5-0.1.0.tar.gz
Upload date: Jun 23, 2026
Size: 98.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.19

File hashes

Hashes for ds5-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fa5d24895d4fcac1393c5d4cb5b34127aaeaeeecaf067dcde38d660b7148d29a`
MD5	`99dba95e2e04b78a18bfe58704fef8e9`
BLAKE2b-256	`cadef7baec594ce332a005e894855ba514c2855f2d812da9caa6809fb6d96833`

See more details on using hashes here.

File details

Details for the file ds5-0.1.0-py3-none-any.whl.

File metadata

Download URL: ds5-0.1.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 98.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.19

File hashes

Hashes for ds5-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd722081903f94cd43dfc86a1f327c93c4448af25ac44fedeb238128e034bd89`
MD5	`be9e5616e682ee225f4bc6f81e95d332`
BLAKE2b-256	`770306bcf2fbb85c9b4ecc5e3f92d670caba82cd872e86058bcae4f5b953e97a`

See more details on using hashes here.

ds5 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DS5

Installation

Quick start

API overview

Data I/O

Preprocessing & QC

Drug analysis

Emax modes

Summary & comparison

Drug name standardization

HDF5 schema

Plate map format

QC configuration

External metrics plugin

Running tests

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes