No project description provided
Project description
SynOmicsBench
SynOmicsBench is a unified benchmarking framework for synthetic data generation (SDG) for clinical transcriptomic cancer cohorts.
Achieving a trade-off between biological utility and patient privacy is critical for secure data sharing when applying transcriptomic clinical datasets to artificial intelligence in precision oncology. Here, we present the SynOmicsBench framework. SynOmicsBench combines standardized preprocessing with multidimensional evaluation, prioritizing downstream biological validation alongside statistical fidelity and attack-based privacy assessment. This work provides a reproducible decision-support tool for method selection and promotes biologically informed, privacy-aware adoption of synthetic data in precision oncology.
Installation
pip install synomicsbench
Python 3.12+ is required.
Quick Start
import pandas as pd
from synomicsbench.processing.preprocessing import DataProcessor
from synomicsbench.processing.metadata import MetaData
from synomicsbench.synthesizer.GaussianCopulasynthesizer import GaussianCopulasynthesizer
from synomicsbench.metrics.fidelity.UnivariateSimilarity import UnivariateSimilarity
# 1. Preprocess
data = pd.read_csv("clinical_transcriptomic_data.csv")
data = DataProcessor.remove_unknown_entities(data, id_column="Patient_ID")
data = DataProcessor.remove_duplications(data, axis=0).reset_index(drop=True)
data = DataProcessor.mice_imputation(data, iterations=10, n_estimators=100)
# 2. Metadata
metadata = MetaData.get_metadata(
data=data,
ordinal_features=["Mstage", "Tx_Start_ECOG", "numPriorTherapies"],
threshold_unique_values=10,
)
# 3. Generate synthetic data
synth = GaussianCopulasynthesizer(output_path="./results", metadata=metadata)
synthetic_data = synth.generate(
data=data,
seed=42,
n_samples=data.shape[0],
output_filename="synthetic_data.csv",
)
# 4. Evaluate
evaluator = UnivariateSimilarity(output_dir="./results/evaluation")
score = evaluator.get_univariate_score(
original_data=data,
synthetic_data=synthetic_data,
metadata=metadata,
save=True,
)
print(f"Univariate Fidelity Score: {score:.4f}")
Documentation
Full documentation, API reference, and benchmarking results: https://trinhthechuong.github.io/SynOmicsBench/
Citation
If you use SynOmicsBench in your research, please cite:
Trinh, T. C., Woillard, J. B., Uguzzoni, G., & Battail, C. (2024). A unified benchmark of synthetic data generation for clinical and transcriptomic cancer data. (Manuscript in preparation)
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synomicsbench-1.0.3.tar.gz.
File metadata
- Download URL: synomicsbench-1.0.3.tar.gz
- Upload date:
- Size: 287.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64ac5e2281a025f43a4db8c5b7951ac3bb9393d9bbf07c06cdda340041521044
|
|
| MD5 |
751b69a05bfa47ec2941d5a9a458edf1
|
|
| BLAKE2b-256 |
a0ab1c7882c49dc75b3e59a0e00fa198e73ad4a8c8494ee620bbed3eb5a6511d
|
File details
Details for the file synomicsbench-1.0.3-py3-none-any.whl.
File metadata
- Download URL: synomicsbench-1.0.3-py3-none-any.whl
- Upload date:
- Size: 103.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd80c2e8c13ba397f2bd08deb1571def392ff3f61fc519c0868430e3892f61a8
|
|
| MD5 |
ba7a4af27a465cc84fae1a66d79860c8
|
|
| BLAKE2b-256 |
469a8487f8743134928d3a0bac91d3c0b717106f03802cca006e8b875614ad11
|