Skip to main content

No project description provided

Project description

SynOmicsBench

SynOmicsBench is a unified benchmarking framework for synthetic data generation (SDG) for clinical transcriptomic cancer cohorts.

Achieving a trade-off between biological utility and patient privacy is critical for secure data sharing when applying transcriptomic clinical datasets to artificial intelligence in precision oncology. Here, we present the SynOmicsBench framework. SynOmicsBench combines standardized preprocessing with multidimensional evaluation, prioritizing downstream biological validation alongside statistical fidelity and attack-based privacy assessment. This work provides a reproducible decision-support tool for method selection and promotes biologically informed, privacy-aware adoption of synthetic data in precision oncology.


Installation

pip install synomicsbench

Python 3.12+ is required.


Quick Start

import pandas as pd
from synomicsbench.processing.preprocessing import DataProcessor
from synomicsbench.processing.metadata import MetaData
from synomicsbench.synthesizer.GaussianCopulasynthesizer import GaussianCopulasynthesizer
from synomicsbench.metrics.fidelity.UnivariateSimilarity import UnivariateSimilarity

# 1. Preprocess
data = pd.read_csv("clinical_transcriptomic_data.csv")
data = DataProcessor.remove_unknown_entities(data, id_column="Patient_ID")
data = DataProcessor.remove_duplications(data, axis=0).reset_index(drop=True)
data = DataProcessor.mice_imputation(data, iterations=10, n_estimators=100)

# 2. Metadata
metadata = MetaData.get_metadata(
    data=data,
    ordinal_features=["Mstage", "Tx_Start_ECOG", "numPriorTherapies"],
    threshold_unique_values=10,
)

# 3. Generate synthetic data
synth = GaussianCopulasynthesizer(output_path="./results", metadata=metadata)
synthetic_data = synth.generate(
    data=data,
    seed=42,
    n_samples=data.shape[0],
    output_filename="synthetic_data.csv",
)

# 4. Evaluate
evaluator = UnivariateSimilarity(output_dir="./results/evaluation")
score = evaluator.get_univariate_score(
    original_data=data,
    synthetic_data=synthetic_data,
    metadata=metadata,
    save=True,
)
print(f"Univariate Fidelity Score: {score:.4f}")

Documentation

Full documentation, API reference, and benchmarking results: https://trinhthechuong.github.io/SynOmicsBench/


Citation

If you use SynOmicsBench in your research, please cite:

Trinh, T. C., Woillard, J. B., Uguzzoni, G., & Battail, C. (2024). A unified benchmark of synthetic data generation for clinical and transcriptomic cancer data. (Manuscript in preparation)


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synomicsbench-1.0.3.tar.gz (287.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synomicsbench-1.0.3-py3-none-any.whl (103.6 kB view details)

Uploaded Python 3

File details

Details for the file synomicsbench-1.0.3.tar.gz.

File metadata

  • Download URL: synomicsbench-1.0.3.tar.gz
  • Upload date:
  • Size: 287.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for synomicsbench-1.0.3.tar.gz
Algorithm Hash digest
SHA256 64ac5e2281a025f43a4db8c5b7951ac3bb9393d9bbf07c06cdda340041521044
MD5 751b69a05bfa47ec2941d5a9a458edf1
BLAKE2b-256 a0ab1c7882c49dc75b3e59a0e00fa198e73ad4a8c8494ee620bbed3eb5a6511d

See more details on using hashes here.

File details

Details for the file synomicsbench-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: synomicsbench-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 103.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for synomicsbench-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bd80c2e8c13ba397f2bd08deb1571def392ff3f61fc519c0868430e3892f61a8
MD5 ba7a4af27a465cc84fae1a66d79860c8
BLAKE2b-256 469a8487f8743134928d3a0bac91d3c0b717106f03802cca006e8b875614ad11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page