A python package for using the CDISC TransCelerate USDM, version 4
Project description
USDM4
A Python library for the CDISC TransCelerate Unified Study Data Model (USDM) Version 4.
Overview
USDM4 provides tools for building, assembling, validating, converting, and expanding clinical study definitions using the USDM Version 4 specification. It enables programmatic creation and manipulation of machine-readable study definitions that conform to CDISC standards.
Features
- Build - Create USDM4 study structures programmatically with a fluent builder interface
- Assemble - Orchestrate complete study assembly from structured input data
- Validate - Validate USDM4 JSON files against defined rules
- Load - Load USDM4 data from JSON files or dictionaries
- Convert - Transform USDM data structures between formats
- Expand - Expand schedule timelines for study designs
Installation
pip install usdm4
Requirements
- Python 3.10 or higher
Quick Start
from usdm4 import USDM4
from simple_error_log.errors import Errors
# Initialize
usdm = USDM4()
errors = Errors()
# Create a minimal study
wrapper = usdm.minimum("My Study", "SPONSOR-001", "1.0", errors)
# Access the study
print(wrapper.study.id)
Usage
Loading Studies
Load a study from a JSON file:
errors = Errors()
wrapper = USDM4().load("study.json", errors)
Load from a dictionary:
data = {...}
wrapper = USDM4().loadd(data, errors)
Validating Studies
result = USDM4().validate("study.json")
if result.passed_or_not_implemented():
print("Validation passed")
else:
print("Validation failed")
Building Studies
Use the builder for programmatic study creation with access to controlled terminology:
errors = Errors()
builder = USDM4().builder(errors)
# Get CDISC codes
code = builder.cdisc_code("C207616", "Official Study Title")
# Get ISO codes
country = builder.iso3166_code("USA")
language = builder.iso639_code("en")
# Create organizations
sponsor = builder.sponsor("My Pharma Corp")
# Create any USDM4 class
study_version = builder.create("StudyVersion", {"versionNumber": "1.0"})
Assembling Studies
For structured assembly of complete studies from domain-organized input:
errors = Errors()
assembler = USDM4().assembler(errors)
assembler.execute({
"identification": {...},
"document": {...},
"population": {...},
"study_design": {...},
"amendments": {...},
"study": {...}
})
wrapper = assembler.wrapper("MySystem", "1.0")
Assembler JSON Input Structure
The assembler accepts a single dictionary with the following top-level keys, each processed by a dedicated sub-assembler:
{
"identification": { ... },
"document": { ... },
"population": { ... },
"amendments": { ... },
"study_design": { ... },
"soa": { ... },
"study": { ... }
}
All top-level keys are required except soa, which is optional.
identification
Study identification, titles, identifiers, organizations, and roles.
{
"titles": {
"brief": "string",
"official": "string",
"public": "string",
"scientific": "string",
"acronym": "string"
},
"identifiers": [
{
"identifier": "string",
"scope": {
"standard": "string",
"non_standard": {
"type": "string",
"role": "string | null",
"name": "string",
"description": "string",
"label": "string",
"identifier": "string",
"identifierScheme": "string",
"legalAddress": {
"lines": ["string"],
"city": "string",
"district": "string",
"state": "string",
"postalCode": "string",
"country": "string"
}
}
}
}
],
"roles": {
"co_sponsor": {
"name": "string",
"address": {
"lines": ["string"],
"city": "string",
"district": "string",
"state": "string",
"postalCode": "string",
"country": "string"
}
},
"local_sponsor": { },
"device_manufacturer": { }
},
"other": {
"sponsor_signatory": "string | null",
"medical_expert": "string | null",
"compound_names": "string | null",
"compound_codes": "string | null"
}
}
Notes:
titlesis optional (defaults to empty). Valid title types:brief,official,public,scientific,acronym.identifiersis optional (defaults to empty list). Each identifierscopemust contain eitherstandardornon_standard, not both.- Valid
standardkeys:ct.gov,ema,fda. These resolve to predefined organizations with complete address information. - Valid
non_standardtype values:registry,regulator,healthcare,pharma,lab,cro,gov,academic,medical_device. - Valid
rolevalues:co-sponsor,manufacturer,investigator,pharmacovigilance,project manager,local sponsor,laboratory,study subject,medical expert,statistician,idmc,care provider,principal investigator,outcomes assessor,dec,clinical trial physician,sponsor,adjudication committee,study site,dsmb,regulatory agency,contract research. rolesis optional (defaults to empty). Each role key (co_sponsor,local_sponsor,device_manufacturer) can benullto skip. Theaddressfield within each role is optional.otheris optional. When present, all four sub-fields are read directly.
document
Protocol document metadata and hierarchical content sections.
{
"document": {
"label": "string",
"version": "string",
"status": "string",
"template": "string",
"version_date": "string"
},
"sections": [
{
"section_number": "string",
"section_title": "string",
"text": "string"
}
]
}
Notes:
- All fields in
documentare required. - Valid
statusvalues:APPROVED,DRAFT,DFT,FINAL,OBSOLETE,PENDING,PENDING REVIEW(case-insensitive). version_dateshould be in ISO format (e.g.2024-01-15).- Section hierarchy is determined by
section_numberdepth:"1"= level 1,"1.1"= level 2,"1.1.1"= level 3. textcontent may contain HTML.
population
Population definitions and eligibility criteria.
{
"label": "string",
"inclusion_exclusion": {
"inclusion": ["string"],
"exclusion": ["string"]
}
}
Notes:
- All fields are required.
- Each inclusion and exclusion item is a text string describing the criterion.
- The
labelis used to generate the internal name (uppercased, spaces replaced with hyphens).
amendments
Study amendment information. Can be null or empty to skip amendment processing entirely.
{
"identifier": "string",
"summary": "string",
"reasons": {
"primary": "string",
"secondary": "string"
},
"impact": {
"safety_and_rights": {
"safety": { "substantial": boolean, "reason": "string" },
"rights": { "substantial": boolean, "reason": "string" }
},
"reliability_and_robustness": {
"reliability": { "substantial": boolean, "reason": "string" },
"robustness": { "substantial": boolean, "reason": "string" }
}
},
"enrollment": {
"value": "integer | string",
"unit": "string"
},
"scope": {
"global": boolean,
"countries": ["string"],
"regions": ["string"],
"sites": ["string"],
"unknown": ["string"]
},
"changes": [
{
"section": "string",
"description": "string",
"rationale": "string"
}
]
}
Notes:
reasonsvalues useCODE:DECODEformat (e.g."C207609:New Safety Information Available").- Valid reason codes:
C207612(Regulatory Agency Request),C207608(New Regulatory Guidance),C207605(IRB/IEC Feedback),C207609(New Safety Information),C207606(Manufacturing Change),C207602(IMP Addition),C207601(Change In Strategy),C207600(Change In Standard Of Care),C207607(New Data Available),C207604(Investigator/Site Feedback),C207611(Recruitment Difficulty),C207603(Inconsistency/Error In Protocol),C207610(Protocol Design Error),C17649(Other),C48660(Not Applicable). enrollmentis optional. Thevalueis converted to integer internally.scopeis optional. Items inunknownare resolved to country or region codes via ISO 3166 lookup. Empty strings inunknownare skipped.changessection references use"NUMBER, TITLE"format (e.g."1.5, Safety Considerations"), which are matched against document sections.
study_design
Study design structure and trial phase.
{
"label": "string",
"rationale": "string",
"trial_phase": "string"
}
Notes:
- All fields are required.
- Valid
trial_phasevalues:0,PRE-CLINICAL,1,I,1-2,1/2,1/2/3,1/3,1A,IA,1B,IB,2,II,2-3,II-III,2A,IIA,2B,IIB,3,III,3A,IIIA,3B,IIIB,4,IV,5,V,2/3/4. PrefixesPHASEorTRIALare automatically stripped. - Default intervention model is Parallel Study (CDISC code C82639).
soa (Schedule of Activities)
Timeline data including epochs, visits, timepoints, activities, and conditions. This entire section is optional.
{
"epochs": {
"items": [
{ "text": "string" }
]
},
"visits": {
"items": [
{
"text": "string",
"references": ["string"]
}
]
},
"timepoints": {
"items": [
{
"index": "string | integer",
"text": "string",
"value": "string | integer",
"unit": "string"
}
]
},
"windows": {
"items": [
{
"before": integer,
"after": integer,
"unit": "string"
}
]
},
"activities": {
"items": [
{
"name": "string",
"visits": [
{
"index": integer,
"references": ["string"]
}
],
"children": [
{
"name": "string",
"visits": [
{
"index": integer,
"references": ["string"]
}
],
"actions": {
"bcs": ["string"]
}
}
],
"actions": {
"bcs": ["string"]
}
}
]
},
"conditions": {
"items": [
{
"reference": "string",
"text": "string"
}
]
}
}
Notes:
- Epochs, visits, and timepoints arrays must be parallel (same length, aligned by index).
windowsmust also be parallel with timepoints.- Negative timepoint
valueindicates before the reference anchor. The first non-negative value determines the anchor point. referenceson visits and activities are condition keys that link to entries in theconditionsarray.childrenare sub-activities nested under a parent activity.actions.bcslists Biomedical Concept names. Known concepts are resolved from the CDISC BC library; unknown names create surrogate BiomedicalConcept objects.- Supported time units:
years/yrs/yr,months/mths/mth,weeks/wks/wk,days/dys/dy,hours/hrs/hr,minutes/mins/min,seconds/secs/sec(case-insensitive).
study
Core study information and metadata.
{
"name": {
"identifier": "string",
"acronym": "string",
"compound": "string"
},
"label": "string",
"version": "string",
"rationale": "string",
"description": "string",
"sponsor_approval_date": "string",
"confidentiality": "string",
"original_protocol": "string | boolean"
}
Notes:
nameis required. At least one ofidentifier,acronym, orcompoundmust be non-empty. Priority order:identifier>acronym>compound. The name is auto-generated (uppercased, non-alphanumeric characters removed).versionandrationaleare required.labelis optional; used as fallback if name generation produces an empty string.description,sponsor_approval_date,confidentiality, andoriginal_protocolare all optional.original_protocolis converted to boolean:"true","1","yes","y"map totrue(case-insensitive).sponsor_approval_dateshould be in ISO format (e.g.2024-01-15).- When present,
confidentiality,original_protocol,compound_codes,compound_names,sponsor_signatory, andmedical_expertare stored as extension attributes on the study version.
Converting Studies
converter = USDM4().convert()
# Transform data structures as needed
Expanding Timelines
expander = USDM4().expander(wrapper)
# Process schedule timeline expansion
API Classes
USDM4 includes 73 domain model classes covering:
| Domain | Classes |
|---|---|
| Study Structure | Study, StudyVersion, StudyDesign, StudyArm, StudyEpoch, StudyElement |
| Interventions | StudyIntervention, Activity, Administration, Procedure, Encounter |
| Population | StudyDesignPopulation, AnalysisPopulation, EligibilityCriterion, SubjectEnrollment |
| Documents | StudyDefinitionDocument, StudyDefinitionDocumentVersion, Amendment |
| Coding | Code, AliasCode, BiomedicalConcept, Objective, Endpoint |
| Timelines | ScheduleTimeline, ScheduledActivityInstance, ScheduledDecisionInstance |
| Organization | StudyIdentifier, Organization, StudySite |
Development
Running Tests
pytest
Tests require 100% code coverage.
Code Formatting
ruff format
ruff check
Building the Package
python3 -m build --sdist --wheel
Publishing
twine upload dist/*
Related Projects
- usdm3 - USDM Version 3 support
License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usdm4-0.21.1.tar.gz.
File metadata
- Download URL: usdm4-0.21.1.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bc1482b396389fd61ca59804ff59902a361660aacb26c888dfe3bec1c75e734
|
|
| MD5 |
dad6bb6fc4d78fc298ffc1de4e3264db
|
|
| BLAKE2b-256 |
3933e8aa31082b4cc1eaa9600cd3c2204f1a63680d1e6d1355a4388a268a6b6c
|
File details
Details for the file usdm4-0.21.1-py3-none-any.whl.
File metadata
- Download URL: usdm4-0.21.1-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
020188506078e332d92b7fb7d4e4af7cc80cea22d3742bb7768211a58de5cc2c
|
|
| MD5 |
0f304fbe7791ec58ac12b116291de1b8
|
|
| BLAKE2b-256 |
1908d20804e6389e8f3ab4feeccfd24b8c25bec7f914581de1348311baab4acf
|