Skip to main content

A package that can generate low-fidelity synthetic CDISC SDTM data based on intelligent sequence generators

Project description

Synthetic SDTM (ssdtm)

This library provides a collection functions to create synthetic CDISC SDTM data. It is largely done using intelligent sequence generators powered by domain knowledge.

Background

The dummy or low-fidelity synthetic SDTM data would be very valuable in multiple scenarios. A few use cases listed below:

Testing and Validation of Systems:

System Configuration: Using dummy data allows for thorough testing and configuration of data management systems before real data is collected, ensuring that systems are correctly set up and can handle the expected data formats and volumes .
Software Validation: Dummy data is essential for validating the software tools used for data capture, processing, and analysis, ensuring they work correctly under various scenarios and edge cases.

Training and Education:

Staff Training: Dummy data provides a safe and realistic way to train clinical staff, data managers, and statisticians on data entry, management, and analysis processes without risking patient confidentiality or data integrity .
Protocol Familiarization: It could help the study team familiarize themselves with the study protocols and data collection methods, improving overall preparedness and efficiency.

Protocol Development and Refinement:

CRF and Protocol Testing: Dummy data can be used to test and refine clinical trial protocols and case report forms (CRFs) before actual patient data is collected, identifying potential issues and making necessary adjustments early in the process .
Scenario Simulation: Simulating various scenarios using fake data helps in identifying and mitigating risks, ensuring the protocol is robust and ready for real-world application.

Quality Control:

Error Detection: By using dummy data, potential data entry errors, inconsistencies, and system flaws can be identified and corrected before the actual trial begins, enhancing data quality and reliability .
Process Optimization: It allows for the optimization of data collection and processing workflows, ensuring they are efficient and capable of handling real data smoothly.

Regulatory Compliance:

Compliance Testing: Ensures that all data handling and processing systems comply with regulatory standards and guidelines by testing with dummy data first, reducing the risk of non-compliance during the actual trial .

Confidentiality and Security:

Safe Testing Environment: Using fake data protects patient confidentiality and adheres to privacy regulations during system testing and staff training, minimizing the risk of data breaches and ethical issues .
Security Assessment: Dummy data can be used to test the security measures of data management systems, ensuring they are robust enough to protect sensitive patient information when real data is collected.

Shorter study startup time

Test and validate the data pipelines: Having access to realistic dummy data allows to test and validate the data entry and data transfer pipelines before the First-Patient-In milestone of a study. This results in a shorter study startup time.
  • Free software: MIT license

Tutorial


How to install

$ pip install ssdtm

Basic Usage

import ssdtm as sd

	
# Generate synthetic single-domain (adverse events) data for 5 patients
ae = sd.get_adverse_events(5)

# Generate synthetic single-domain (concomitant medication) data 5 patients
cm = sd.get_conmeds(5)

# Generate synthetic single-domain (adverse events) data 5 patients
dm = sd.get_demographics(5)

# Generate synthetic single-domain (adverse events) data 5 patients
ex = sd.get_exposure(5)

# Generate lab anbalytes dataset for 8 patients, where each patient has data for 4 visits.
lb = sd.get_lab_analytes(8,4)

# Generate vital signs dataset for 8 patients, where each patient has data for 4 visits.
vs = sd.get_vital_signs(8,4)

# Generates CDISC SDTM data for 6 domains (ae, cm, dm, ex, lb, and vs)
data = sd.get_sdtm_data(8,4)
# Then you can access individual domain-specific dataframes as follows
data['cm']
data['dm']
data['vs']

# This generates and saves the SDTM data for 6 common SDTM domains in the local directory
sd.save_sdtm_data(8,4)

# Generate vital signs dataset for 8 patients, assuming 5 visits per patient.
rs = sd.get_response(8)

# Generate vital signs dataset for 8 patients, where each patient can have 1 to 5 tumors.
tu = sd.get_tumor_identification(8)

# Generate tumor results dataset for 8 patients, where each patient can have 1 to 5 tumors.
tr = sd.get_tumor_results(8)

# Generates CDISC SDTM data for 6 generic domains (ae, cm, dm, ex, lb, and vs) and additional therapeutic area specific domains (e.g. for 'oncology' we would have rs, tu and tr)
data = sd.get_sdtm_data(8,4, 'oncology')
# Then you can access individual domain-specific dataframes as follows
data['cm']
data['dm']
data['vs']
# And TA-specific individual domain dataframes as follows
data['rs']
data['tu']
data['tr']

# This generates and saves the SDTM data for 6 common SDTM domains and 3 therapeutic area specific domains in the local directory
sd.save_sdtm_data(8,4, 'oncology')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssdtm-0.1.3.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ssdtm-0.1.3-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file ssdtm-0.1.3.tar.gz.

File metadata

  • Download URL: ssdtm-0.1.3.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.31.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for ssdtm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2221fae5c0dc635a4792895795389c76244399ff2d26a4afed4a024b5dd2ba38
MD5 9303a41b6abf848200ca57c4d883848e
BLAKE2b-256 a722aadc6ebfe79c1235658dc45da4c4d69fd32863c7f7fb45b7d2c1f6858e87

See more details on using hashes here.

File details

Details for the file ssdtm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ssdtm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.31.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for ssdtm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b1b570bd56c11388e00e71cf958ce4a47c53df32fa5462e822db77fff9a175ed
MD5 178252227551a6f9153c1772e4fb6420
BLAKE2b-256 501ea3fcb23e4f662bf18e53e283e95fb12fc3cb7399d6d56a331980ca04bf5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page