Skip to main content

A Python package for handling and processing drug screening data in HDF5 format

Project description

DS5

A Python package for drug sensitivity screening data analysis. DS5 handles the full pipeline from raw plate-reader data to drug sensitivity metrics (IC50, EC50, Emax, DSS) with built-in quality control, DMSO normalization, and reporting.

Installation

# From the project root
pip install -e .

# With dev dependencies (pytest, jupyter)
pip install -e ".[dev]"

Requires Python 3.11–3.12.

Quick start

import DS5

# 1. Create a new HDF5 file
DS5.gen_new_HDF5("experiment.h5")

# 2. Load plate-reader data from Excel
DS5.load_excel_to_h5(
    "experiment.h5",
    well_read_file_name="plate_reads.xlsx",
    well_read_sheet_name="Sheet1",
    plate_map_file_name="plate_map.xlsx",
    plate_map_sheet_name="Sheet1",
    patient_id="HCI001",
    test_id="set1",
)

# 3. Preprocess (outlier removal)
DS5.preprocess_data("experiment.h5")

# 4. Analyze a single drug
ic50 = DS5.analyze_drug_ic50("experiment.h5", "HCI001", "set1", "Doxorubicin")
print(f"IC50 = {ic50['ic50']['value']}")

# 5. Summarize all drugs in one table
summary = DS5.summarize_test_results("experiment.h5", "HCI001", "set1")
print(summary)

# 6. Batch process and cache results
DS5.process_ds5("experiment.h5")

# 7. Extract data for custom analysis
df = DS5.get_data("experiment.h5", "HCI001_set1", data_type="normalized")

API overview

Data I/O

Function Description
gen_new_HDF5(file_name) Create empty DS5-format HDF5 file
load_excel_to_h5(...) Load plate-reader Excel + plate map into HDF5
export_h5_to_excel(h5, output) Export HDF5 contents to Excel workbook
load_GDSC_to_h5(csv, ...) Load GDSC-format CSV into HDF5
load_all_GDSC_to_h5(csv, ...) Batch-load all experiments from GDSC CSV
generate_GDSC_screen_list(csv, ...) List available screens in a GDSC CSV
get_data(h5, screen, data_type) Extract data as DataFrame (intensity, normalized, etc.)

Preprocessing & QC

Function Description
preprocess_data(h5, qc_para_file=None) Apply outlier removal to all screens
check_preprocess(h5, patient, test, drug) Visualize preprocessing effect on a drug
QC_visual(h5, screen, qc_para_file) Generate before/after QC comparison plots

Drug analysis

Function Description
analyze_dmso_controls(h5, patient, test) DMSO control statistics and boxplot
analyze_all_dmso(h5, patient=None) DMSO analysis across all screens
analyze_drug_ic50(h5, patient, test, drug) IC50 via 4-parameter logistic fit
analyze_drug_ec50(h5, patient, test, drug) EC50 (50% absolute inhibition)
analyze_drug_emax(h5, patient, test, drug, mode) Maximum inhibition (supports multiple Emax modes)
calculate_DSS(h5, patient, test, drug) DSS1, DSS2, DSS3 drug sensitivity scores

Emax modes

The mode (or emax_mode) parameter controls how Emax is computed. All functions that compute Emax support these modes:

Mode Definition Requires curve fit
observed_best (default) Highest mean inhibition at any tested concentration No
observed_highest_dose Mean inhibition at the highest tested concentration No
fitted_highest_dose 4PL model-predicted response at the highest tested concentration Yes (falls back to observed_best)
e_inf Fitted 4PL asymptote, must be in [-10, 200]% Yes (falls back to observed_best)
# Single drug analysis with Emax mode
emax = DS5.analyze_drug_emax("experiment.h5", "HCI001", "set1", "Doxorubicin", mode="e_inf")

# Batch processing with Emax mode
DS5.process_ds5("experiment.h5", emax_mode="fitted_highest_dose")

# Summary and comparison with Emax mode
summary = DS5.summarize_test_results("experiment.h5", "HCI001", "set1", emax_mode="e_inf")
comparison = DS5.compare_metrics("experiment.h5", emax_mode="e_inf")

DSS2 always uses the fitted Emax from the 4PL curve regardless of emax_mode.

Summary & comparison

Function Description
summarize_test_results(h5, patient, test, emax_mode) All metrics for all drugs in one DataFrame
process_ds5(input_h5, output_h5=None, emax_mode) Batch-process and cache summary tables
compare_metrics(h5, patient=None, emax_mode) Cross-screen metric comparison
generate_report(h5, test_name) HTML report with heatmaps and top drug picks

Drug name standardization

Function Description
standardize_drug_name(name) Resolve via RxNorm/PubChem → rx:12345, pc:6789, or raw:name
register_metric(name, func) Register an external metric plugin

HDF5 schema

DS5 stores all data in a single HDF5 file. See docs/HDF5_SCHEMA.md for full details.

/patients/
  /{patient_id}/
    /{test_id}/
      data                 # Raw plate-reader values (byte-string array)
      plate_map            # Well identifiers: "DrugName concentration" or "DMSO"
      preprocessed_data    # (optional) Float array with outliers set to NaN
      summary_table        # (optional) Cached metric summary from process_ds5
/drug_standardization_table  # (optional) Maps raw drug names ↔ rx:/pc: IDs

Plate map format

The plate map Excel file should have row labels (A, B, C, ...) and column labels (1, 2, 3, ...) matching standard microplate layout. Each cell contains either:

  • DMSO — marks a DMSO control well
  • DrugName concentration — e.g., Doxorubicin 0.1 (drug name, space, concentration in µM)

QC configuration

Preprocessing is controlled by a QC_para.txt file with key=value pairs:

# QC_para.txt example
left_percentile = 1
right_percentile = 99
dmso_use_mad = true
drug_outlier_threshold = 5
Parameter Default Description
left_percentile 0 Lower percentile cutoff for global outlier removal
right_percentile 0 Upper percentile cutoff for global outlier removal
dmso_use_mad true Use MAD-based (true) or IQR-based (false) DMSO outlier removal
drug_outlier_threshold 5 Median-ratio threshold for per-drug outlier removal

If no QC file is provided, defaults are used (minimal filtering).

External metrics plugin

You can extend DS5 with custom metrics:

from DS5 import register_metric

def compute_my_metric(h5_file_name, patient_id, test_id, drug_name, **kwargs):
    """Must return a dict of {column_name: value}."""
    # ... your computation ...
    return {
        "MY_SCORE": 42.0,
        "__meta__": {"prefer_higher": True},  # optional: controls ranking direction
    }

register_metric("my_metric", compute_my_metric)

# Now use it in summarize_test_results
summary = DS5.summarize_test_results(
    "experiment.h5", "HCI001", "set1",
    use_external_metrics=True,
    external_metrics=["my_metric"],
)
# summary DataFrame will include a MY_SCORE column

See external_metrics/calculate_metric_max_viability.py for a complete example.

Running tests

pytest tests/ -v -m "not network"

Test data lives in tests/fixtures/synthetic.h5 contains a 9x6 plate with 3 drugs at 5 concentrations + DMSO controls. Expected metric outputs are recorded in golden_values.json. Tests use 10% relative tolerance so minor algorithm improvements pass but large regressions fail.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds5-0.1.0.tar.gz (98.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ds5-0.1.0-py3-none-any.whl (98.6 kB view details)

Uploaded Python 3

File details

Details for the file ds5-0.1.0.tar.gz.

File metadata

  • Download URL: ds5-0.1.0.tar.gz
  • Upload date:
  • Size: 98.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for ds5-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa5d24895d4fcac1393c5d4cb5b34127aaeaeeecaf067dcde38d660b7148d29a
MD5 99dba95e2e04b78a18bfe58704fef8e9
BLAKE2b-256 cadef7baec594ce332a005e894855ba514c2855f2d812da9caa6809fb6d96833

See more details on using hashes here.

File details

Details for the file ds5-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ds5-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 98.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for ds5-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd722081903f94cd43dfc86a1f327c93c4448af25ac44fedeb238128e034bd89
MD5 be9e5616e682ee225f4bc6f81e95d332
BLAKE2b-256 770306bcf2fbb85c9b4ecc5e3f92d670caba82cd872e86058bcae4f5b953e97a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page