A python package for using the CDISC TransCelerate USDM, version 4

These details have not been verified by PyPI

Project description

USDM4

A Python library for the CDISC TransCelerate Unified Study Data Model (USDM) Version 4.

Overview

USDM4 provides tools for building, assembling, validating, converting, and expanding clinical study definitions using the USDM Version 4 specification. It enables programmatic creation and manipulation of machine-readable study definitions that conform to CDISC standards.

Features

Build - Create USDM4 study structures programmatically with a fluent builder interface
Assemble - Orchestrate complete study assembly from structured input data
Validate - Validate USDM4 JSON files via two complementary engines (the bundled d4k Python rule library and the CDISC CORE engine)
Load - Load USDM4 data from JSON files or dictionaries
Convert - Transform USDM data structures between formats
Expand - Expand schedule timelines for study designs

Installation

pip install usdm4

Requirements

Python 3.12 or higher (required by the cdisc-rules-engine dependency)

Quick Start

from usdm4 import USDM4
from simple_error_log.errors import Errors

# Initialize
usdm = USDM4()
errors = Errors()

# Create a minimal study
wrapper = usdm.minimum("My Study", "SPONSOR-001", "1.0", errors)

# Access the study
print(wrapper.study.id)

Usage

Loading Studies

Load a study from a JSON file:

errors = Errors()
wrapper = USDM4().load("study.json", errors)

Load from a dictionary:

data = {...}
wrapper = USDM4().loadd(data, errors)

Validating Studies

USDM4 provides two complementary validation engines, both invoked from the USDM4 facade.

The d4k engine — usdm4's own Python rule library — runs the V4 DDF rule catalogue. It is fast enough for tight feedback loops and has no external dependencies:

result = USDM4().validate("study.json")

if result.passed_or_not_implemented():
    print("Validation passed")
else:
    print("Validation failed")

The CDISC CORE engine — wrapping the cdisc-rules-engine package — runs the same catalogue in JSONata against authoritative CDISC sources. Use it as an independent cross-check; it needs a CDISC Library API key and downloads rule definitions on first use:

import os
os.environ["CDISC_LIBRARY_API_KEY"] = "your-api-key-here"

result = USDM4().validate_core("study.json")

if result.is_valid:
    print("CORE validation passed")
else:
    print(result.format_text())

To pre-populate the CORE cache (useful at server startup or before running in offline environments):

USDM4().prepare_core()

For running both engines and aligning their per-rule results from the command line, see validate/README.md.

CORE validation API

For more control, use CoreValidator directly without the USDM4 facade:

from usdm4.core import CoreValidator

validator = CoreValidator(
    cache_dir="/path/to/my/cache",
    api_key="my-api-key",
)
result = validator.validate("study.json", version="4-0")

validate_core(file_path, version="4-0", cache_dir=None, api_key=None) parameters:

file_path — Path to the USDM JSON file.
version — "3-0" or "4-0" (default "4-0").
cache_dir — Optional path to the cache directory. Defaults to a platform-appropriate location via platformdirs (see "CORE validation cache" below).
api_key — Optional CDISC Library API key. Falls back to CDISC_LIBRARY_API_KEY or CDISC_API_KEY environment variables.

CoreValidationResult properties:

is_valid — True if no validation findings were reported.
finding_count — Total number of individual validation errors across all findings.
execution_error_count — Number of rule execution errors (rules that don't apply to this file).
rules_executed — Total rules that were run.
rules_skipped — Rules skipped due to known engine bugs (see docs/cre_issues.md).
ct_packages_available — Number of CT packages known to CDISC Library.
ct_packages_loaded — List of CT package names loaded for this file.
findings — List of CoreRuleFinding objects.

CoreValidationResult methods:

format_text() — Human-readable text report.
to_dict() — JSON-serialisable dictionary.

CoreRuleFinding — one rule that reported errors:

rule_id — The CORE rule identifier (e.g. "CORE-000996").
description — Human-readable description of what the rule checks.
message — Error message template from the rule.
errors — List of error detail dicts.
error_count — Number of errors for this rule.

CoreCacheManager — accessed via validator.cache_manager:

cache_dir — The root cache directory path.
clear() — Remove all cached resources; they will re-download on next use.
ensure_resources() — Download JSONata and XSD schema files if not already cached.

CORE validation cache

The module uses a three-level caching strategy: persistent disk cache, an in-memory cache used by the engine within a single process, and remote download from the CDISC Library on cache miss.

Resource	Location	Source on first run
Validation rules	`{cache_dir}/rules/usdm/4-0.json`	CDISC Library API
CT package list	`{cache_dir}/ct/published_packages.json`	CDISC Library API
CT codelist data	`{cache_dir}/ct/data/{package}.json`	CDISC Library API
JSONata functions	`{cache_dir}/resources/jsonata/`	GitHub (cdisc-rules-engine repo)
XSD schemas	`{cache_dir}/resources/schema/xml/`	GitHub (cdisc-rules-engine repo)

The default cache_dir is platform-appropriate, resolved via platformdirs:

macOS: ~/Library/Caches/usdm4/core/
Windows: %LOCALAPPDATA%/usdm4/Cache/core/
Linux: ~/.cache/usdm4/core/

For web-server deployments, pass an explicit cache_dir to USDM4(cache_dir=...) or CoreValidator(cache_dir=...). To force a fresh download:

from usdm4.core import CoreValidator
CoreValidator().cache_manager.clear()

Troubleshooting CORE validation

"No CDISC API key" — Set CDISC_API_KEY or CDISC_LIBRARY_API_KEY in the environment.
Slow first run — The first validation downloads rules, CT packages, and schema files. Subsequent runs use the cache.
CT validation failures — Check that codeSystemVersion values in your USDM JSON correspond to published CT packages. result.ct_packages_loaded shows which packages were loaded.
Stale cache — If rules or CT packages have been updated upstream, clear the cache with validator.cache_manager.clear().

Engine bugs and workarounds are catalogued in docs/cre_issues.md.

CORE validation references

Building Studies

Use the builder for programmatic study creation with access to controlled terminology:

errors = Errors()
builder = USDM4().builder(errors)

# Get CDISC codes
code = builder.cdisc_code("C207616", "Official Study Title")

# Get ISO codes
country = builder.iso3166_code("USA")
language = builder.iso639_code("en")

# Create organizations
sponsor = builder.sponsor("My Pharma Corp")

# Create any USDM4 class
study_version = builder.create("StudyVersion", {"versionNumber": "1.0"})

Assembling Studies

For structured assembly of complete studies from domain-organized input:

errors = Errors()
assembler = USDM4().assembler(errors)

assembler.execute({
    "identification": {...},
    "document": {...},
    "population": {...},
    "study_design": {...},
    "amendments": {...},
    "study": {...}
})

wrapper = assembler.wrapper("MySystem", "1.0")

Assembler JSON Input Structure

The assembler accepts a single dictionary with the following top-level keys, each processed by a dedicated sub-assembler:

{
  "identification": { ... },
  "document": { ... },
  "population": { ... },
  "amendments": { ... },
  "study_design": { ... },
  "soa": { ... },
  "study": { ... }
}

All top-level keys are required except soa, which is optional.

`identification`

Study identification, titles, identifiers, organizations, and roles.

{
  "titles": {
    "brief": "string",
    "official": "string",
    "public": "string",
    "scientific": "string",
    "acronym": "string"
  },
  "identifiers": [
    {
      "identifier": "string",
      "scope": {
        "standard": "string",
        "non_standard": {
          "type": "string",
          "role": "string | null",
          "name": "string",
          "description": "string",
          "label": "string",
          "identifier": "string",
          "identifierScheme": "string",
          "legalAddress": {
            "lines": ["string"],
            "city": "string",
            "district": "string",
            "state": "string",
            "postalCode": "string",
            "country": "string"
          }
        }
      }
    }
  ],
  "roles": {
    "co_sponsor": {
      "name": "string",
      "address": {
        "lines": ["string"],
        "city": "string",
        "district": "string",
        "state": "string",
        "postalCode": "string",
        "country": "string"
      }
    },
    "local_sponsor": { },
    "device_manufacturer": { }
  },
  "other": {
    "sponsor_signatory": "string | null",
    "medical_expert": "string | null",
    "compound_names": "string | null",
    "compound_codes": "string | null"
  }
}

Notes:

titles is optional (defaults to empty). Valid title types: brief, official, public, scientific, acronym.
identifiers is optional (defaults to empty list). Each identifier scope must contain either standard or non_standard, not both.
Valid standard keys: ct.gov, ema, fda. These resolve to predefined organizations with complete address information.
Valid non_standard type values: registry, regulator, healthcare, pharma, lab, cro, gov, academic, medical_device.
Valid role values: co-sponsor, manufacturer, investigator, pharmacovigilance, project manager, local sponsor, laboratory, study subject, medical expert, statistician, idmc, care provider, principal investigator, outcomes assessor, dec, clinical trial physician, sponsor, adjudication committee, study site, dsmb, regulatory agency, contract research.
roles is optional (defaults to empty). Each role key (co_sponsor, local_sponsor, device_manufacturer) can be null to skip. The address field within each role is optional.
other is optional. When present, all four sub-fields are read directly.

`document`

Protocol document metadata and hierarchical content sections.

{
  "document": {
    "label": "string",
    "version": "string",
    "status": "string",
    "template": "string",
    "version_date": "string"
  },
  "sections": [
    {
      "section_number": "string",
      "section_title": "string",
      "text": "string"
    }
  ]
}

Notes:

All fields in document are required.
Valid status values: APPROVED, DRAFT, DFT, FINAL, OBSOLETE, PENDING, PENDING REVIEW (case-insensitive).
version_date should be in ISO format (e.g. 2024-01-15).
Section hierarchy is determined by section_number depth: "1" = level 1, "1.1" = level 2, "1.1.1" = level 3.
text content may contain HTML.

`population`

Population definitions and eligibility criteria.

{
  "label": "string",
  "inclusion_exclusion": {
    "inclusion": ["string"],
    "exclusion": ["string"]
  }
}

Notes:

All fields are required.
Each inclusion and exclusion item is a text string describing the criterion.
The label is used to generate the internal name (uppercased, spaces replaced with hyphens).

`amendments`

Study amendment information. Can be null or empty to skip amendment processing entirely.

{
  "identifier": "string",
  "summary": "string",
  "reasons": {
    "primary": "string",
    "secondary": "string"
  },
  "impact": {
    "safety_and_rights": {
      "safety": { "substantial": boolean, "reason": "string" },
      "rights": { "substantial": boolean, "reason": "string" }
    },
    "reliability_and_robustness": {
      "reliability": { "substantial": boolean, "reason": "string" },
      "robustness": { "substantial": boolean, "reason": "string" }
    }
  },
  "enrollment": {
    "value": "integer | string",
    "unit": "string"
  },
  "scope": {
    "global": boolean,
    "countries": ["string"],
    "regions": ["string"],
    "sites": ["string"],
    "unknown": ["string"]
  },
  "changes": [
    {
      "section": "string",
      "description": "string",
      "rationale": "string"
    }
  ]
}

Notes:

reasons values use CODE:DECODE format (e.g. "C207609:New Safety Information Available").
Valid reason codes: C207612 (Regulatory Agency Request), C207608 (New Regulatory Guidance), C207605 (IRB/IEC Feedback), C207609 (New Safety Information), C207606 (Manufacturing Change), C207602 (IMP Addition), C207601 (Change In Strategy), C207600 (Change In Standard Of Care), C207607 (New Data Available), C207604 (Investigator/Site Feedback), C207611 (Recruitment Difficulty), C207603 (Inconsistency/Error In Protocol), C207610 (Protocol Design Error), C17649 (Other), C48660 (Not Applicable).
enrollment is optional. The value is converted to integer internally.
scope is optional. Items in unknown are resolved to country or region codes via ISO 3166 lookup. Empty strings in unknown are skipped.
changes section references use "NUMBER, TITLE" format (e.g. "1.5, Safety Considerations"), which are matched against document sections.

`study_design`

Study design structure and trial phase.

{
  "label": "string",
  "rationale": "string",
  "trial_phase": "string"
}

Notes:

All fields are required.
Valid trial_phase values: 0, PRE-CLINICAL, 1, I, 1-2, 1/2, 1/2/3, 1/3, 1A, IA, 1B, IB, 2, II, 2-3, II-III, 2A, IIA, 2B, IIB, 3, III, 3A, IIIA, 3B, IIIB, 4, IV, 5, V, 2/3/4. Prefixes PHASE or TRIAL are automatically stripped.
Default intervention model is Parallel Study (CDISC code C82639).

`soa` (Schedule of Activities)

Timeline data including epochs, visits, timepoints, activities, and conditions. This entire section is optional.

{
  "epochs": {
    "items": [
      { "text": "string" }
    ]
  },
  "visits": {
    "items": [
      {
        "text": "string",
        "references": ["string"]
      }
    ]
  },
  "timepoints": {
    "items": [
      {
        "index": "string | integer",
        "text": "string",
        "value": "string | integer",
        "unit": "string"
      }
    ]
  },
  "windows": {
    "items": [
      {
        "before": integer,
        "after": integer,
        "unit": "string"
      }
    ]
  },
  "activities": {
    "items": [
      {
        "name": "string",
        "visits": [
          {
            "index": integer,
            "references": ["string"]
          }
        ],
        "children": [
          {
            "name": "string",
            "visits": [
              {
                "index": integer,
                "references": ["string"]
              }
            ],
            "actions": {
              "bcs": ["string"]
            }
          }
        ],
        "actions": {
          "bcs": ["string"]
        }
      }
    ]
  },
  "conditions": {
    "items": [
      {
        "reference": "string",
        "text": "string"
      }
    ]
  }
}

Notes:

Epochs, visits, and timepoints arrays must be parallel (same length, aligned by index).
windows must also be parallel with timepoints.
Negative timepoint value indicates before the reference anchor. The first non-negative value determines the anchor point.
references on visits and activities are condition keys that link to entries in the conditions array.
children are sub-activities nested under a parent activity.
actions.bcs lists Biomedical Concept names. Known concepts are resolved from the CDISC BC library; unknown names create surrogate BiomedicalConcept objects.
Supported time units: years/yrs/yr, months/mths/mth, weeks/wks/wk, days/dys/dy, hours/hrs/hr, minutes/mins/min, seconds/secs/sec (case-insensitive).

`study`

Core study information and metadata.

{
  "name": {
    "identifier": "string",
    "acronym": "string",
    "compound": "string"
  },
  "label": "string",
  "version": "string",
  "rationale": "string",
  "description": "string",
  "sponsor_approval_date": "string",
  "confidentiality": "string",
  "original_protocol": "string | boolean"
}

Notes:

name is required. At least one of identifier, acronym, or compound must be non-empty. Priority order: identifier > acronym > compound. The name is auto-generated (uppercased, non-alphanumeric characters removed).
version and rationale are required.
label is optional; used as fallback if name generation produces an empty string.
description, sponsor_approval_date, confidentiality, and original_protocol are all optional.
original_protocol is converted to boolean: "true", "1", "yes", "y" map to true (case-insensitive).
sponsor_approval_date should be in ISO format (e.g. 2024-01-15).
When present, confidentiality, original_protocol, compound_codes, compound_names, sponsor_signatory, and medical_expert are stored as extension attributes on the study version.

Converting Studies

converter = USDM4().convert()
# Transform data structures as needed

Expanding Timelines

expander = USDM4().expander(wrapper)
# Process schedule timeline expansion

API Classes

Domain model classes are organised by area:

Domain	Classes
Study Structure	`Study`, `StudyVersion`, `StudyDesign`, `StudyArm`, `StudyEpoch`, `StudyElement`
Interventions	`StudyIntervention`, `Activity`, `Administration`, `Procedure`, `Encounter`
Population	`StudyDesignPopulation`, `AnalysisPopulation`, `EligibilityCriterion`, `SubjectEnrollment`
Documents	`StudyDefinitionDocument`, `StudyDefinitionDocumentVersion`, `Amendment`
Coding	`Code`, `AliasCode`, `BiomedicalConcept`, `Objective`, `Endpoint`
Timelines	`ScheduleTimeline`, `ScheduledActivityInstance`, `ScheduledDecisionInstance`
Organization	`StudyIdentifier`, `Organization`, `StudySite`

Development

Running Tests

pytest

Tests require 100% code coverage.

Code Formatting

ruff format
ruff check

Building the Package

python3 -m build --sdist --wheel

Publishing

twine upload dist/*

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Intended Audience
- Developers
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.25.0

May 4, 2026

This version

0.24.0

May 4, 2026

0.23.0

Apr 30, 2026

0.22.0

Apr 24, 2026

0.21.1

Apr 10, 2026

0.21.0

Apr 5, 2026

0.20.0

Mar 31, 2026

0.19.0

Feb 27, 2026

0.18.0

Feb 20, 2026

0.17.0

Feb 8, 2026

0.16.0

Feb 7, 2026

0.15.0

Jan 3, 2026

0.14.0

Dec 14, 2025

0.13.1

Nov 14, 2025

0.13.0

Nov 9, 2025

0.12.0

Oct 23, 2025

0.11.0

Oct 13, 2025

0.10.0

Sep 21, 2025

0.9.1

Sep 1, 2025

0.9.0

Aug 31, 2025

0.8.2

Jul 26, 2025

0.8.1

Jul 26, 2025

0.8.0

Jul 26, 2025

0.7.0

Jun 6, 2025

0.6.0

Apr 7, 2025

0.5.0

Apr 5, 2025

0.4.0

Mar 31, 2025

0.3.1

Mar 18, 2025

0.3.0

Mar 18, 2025

0.2.0

Mar 18, 2025

0.1.0

Mar 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usdm4-0.24.0.tar.gz (2.4 MB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

usdm4-0.24.0-py3-none-any.whl (2.7 MB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file usdm4-0.24.0.tar.gz.

File metadata

Download URL: usdm4-0.24.0.tar.gz
Upload date: May 4, 2026
Size: 2.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for usdm4-0.24.0.tar.gz
Algorithm	Hash digest
SHA256	`14e097c01131233446ba52f86c8aa8f211cb9a8c8fddb9c2e506d293037bb3f2`
MD5	`74d5cc6a182e7e0fdc81ceb1ffe7ed41`
BLAKE2b-256	`f5c10f5d27477dcc02585772d6378814eaca40fb90a5a4a2d57d4aa2c909dfea`

See more details on using hashes here.

File details

Details for the file usdm4-0.24.0-py3-none-any.whl.

File metadata

Download URL: usdm4-0.24.0-py3-none-any.whl
Upload date: May 4, 2026
Size: 2.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for usdm4-0.24.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d054ac5144b3dc15649e21d4b1f1d0d339d4792fc2b988c16bf8fb4f4ff34ad1`
MD5	`429397f622d52caa6d0f3cde65918068`
BLAKE2b-256	`b16ad3320cadaedd688f1dbeef0755f3331fd561fffbd047933ffe6e8347eb56`

See more details on using hashes here.

usdm4 0.24.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

USDM4

Overview

Features

Installation

Requirements

Quick Start

Usage

Loading Studies

Validating Studies

CORE validation API

CORE validation cache

Troubleshooting CORE validation

CORE validation references

Building Studies

Assembling Studies

Assembler JSON Input Structure

identification

document

population

amendments

study_design

soa (Schedule of Activities)

study

Converting Studies

Expanding Timelines

API Classes

Development

Running Tests

Code Formatting

Building the Package

Publishing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`identification`

`document`

`population`

`amendments`

`study_design`

`soa` (Schedule of Activities)

`study`