Skip to main content

LinkML schemas modelling REDCap structures — the data dictionary and the record-data envelope

Project description

linkml-redcap

CI PyPI version License: MIT Docs

LinkML schemas modelling REDCap structures. This package is the canonical, vendor-neutral source of the REDCap primitives that downstream projects build on — what a valid data dictionary looks like, and what record data looks like in both its flat and structured shapes. You can find more information in the documentation.

It is an umbrella with two submodules:

  • data_dictionary — the meta-schema that formalises a REDCap data dictionary: the 18-column CSV structure, field-naming rules, field/validation types, and matrix groups.
  • record — a reusable envelope for REDCap record data: the structural fields REDCap adds to every export (record_id, redcap_event_name, redcap_repeat_instrument, redcap_repeat_instance, ...), the universal *_complete / yes-no / true-false / checkbox value spaces, the REDCap date/time string types, and the RepeatedElement wrapper — plus the generic flat ⇄ structured grouping step.

It contains no project-specific or RareLink-specific rules. Those conventions (variable naming, instrument naming, annotation profile) live in RareLink and are layered on top of these primitives. Contributions and collaboration are welcome.

Install

pip install linkml-redcap

Usage

# --- the data dictionary meta-schema ---
from linkml_redcap.data_dictionary import schema_view as dd_view
sv = dd_view()
assert "Field" in sv.all_classes()
assert "FieldType" in sv.all_enums()

# --- the record-data envelope ---
from linkml_redcap.record import schema_view as record_view, group_flat_records
rv = record_view()
assert "FlatRecord" in rv.all_classes()
assert "StructuredRecord" in rv.all_classes()
assert "FormCompleteStatus" in rv.all_enums()

# group a flat REDCap export into per-record objects (generic structural step)
grouped = group_flat_records(flat_rows, drop_empty=True)

Each submodule exposes the same loader API — schema_path() -> Path (for LinkML CLI tools) and schema_view() -> SchemaView (for introspection). Schemas load via importlib.resources, so they work from any install location — regular installs, zipped wheels, or air-gapped environments.

What's in it

linkml_redcap.data_dictionary

A LinkML meta-schema for the REDCap data dictionary CSV:

  • DataDictionary — tree root, a complete DD
  • Field — one CSV row, all 18 REDCap columns as typed/constrained slots
  • Choice — structured representation of one permissible value
  • Instrument — logical grouping of contiguous fields sharing a form_name

Enums cover REDCap's native value spaces (FieldType, TextValidationType incl. the European comma-decimal variants, CustomAlignment, IdentifierStatus, MatrixRanking).

This schema is vendor-neutral REDCap only. RareLink ontology conventions — variable-name prefixes (snomedct_…), the structured field_annotation grammar, and BioPortal/ontology curation — are not defined here; they live in rarelink.

linkml_redcap.record

The single source of truth for the REDCap record representation — the actual shape of REDCap record data, in the two serializations the data uses:

  • FlatRecord (abstract) — one flat-export row; a project specialises it via is_a: FlatRecord and adds one typed slot per field variable.
  • StructuredRecord (abstract) — the lossless record-grouped serialization: one object per record_id with its repeating-instrument instances nested. A project specialises it (is_a), marks itself tree_root, and adds its typed non-repeating instrument slots.
  • RepeatedElement — the (redcap_repeat_instrument, redcap_repeat_instance) wrapper for one repeating-instrument instance. The canonical version of the class rarelink and cieinr currently redeclare per repo — import this one.
  • Structural slots: record_id, redcap_event_name, redcap_repeat_instrument, redcap_repeat_instance, redcap_data_access_group, redcap_survey_identifier, repeated_elements.
  • Enums FormCompleteStatus (0/1/2), YesNo, TrueFalse, CheckboxState — REDCap's universal value spaces.
  • Types redcap_date / redcap_datetime / redcap_time / redcap_integer / redcap_number / redcap_email — one typed primitive per REDCap value space, so a project can give every field a defined range (never a bare string where REDCap constrains the value, never Any).
  • grouping.group_flat_records / ungroup_records — the lossless flat ⇄ structured conversion (the cardinality change linkml-map can't express), depending only on REDCap-native fields.

The structured envelope here is deliberately generic, abstract and convention-free — REDCap's own record-and-repeats model, serialized hierarchically. It defines no typed instrument classes, variable-naming rules, or ontology/Phenopacket mappings: that semantic layer is the consuming project's (the instrument classes that specialise these) and rarelink's (the conventions and the Phenopacket/FHIR engine). Nothing RareLink-specific is defined here.

See instructions/MAPPING_WORKFLOW.md for the full flat → grouped → structured pipeline and how it feeds Phenopacket/FHIR export.

Why this exists

REDCap data dictionaries are unstructured CSVs governed by implicit rules, and REDCap record exports are flat and wide. Making both machine-readable lets downstream tools validate any DD or export, generate DDs from higher-level specs, and transform flat exports into structured models with linkml-map instead of hand-written, per-instrument Python — the same envelope serving RareLink-CDM, the MII KDS-SE editions, CIEINR, and any other RareLink-based model.

Backwards compatibility

This package is an import dependency, so its schema ids and public class/slot/enum/type names are treated as a stable API: additive minor releases only, deprecate-don't-remove, breaking changes ⇒ major bump. The policy is in instructions/COMPATIBILITY.md and enforced by tests/test_public_surface.py, which fails CI if any public name disappears.

Ecosystem

  • RareLink — the REDCap-based rare disease interoperability framework that defines the rules built on these primitives
  • rd-cdm — ontology code systems & versions

Development

git clone https://github.com/linkml/linkml-redcap
cd linkml-redcap
poetry install --with dev

# Validate the schemas
poetry run linkml-lint src/linkml_redcap/data_dictionary/schema/redcap_data_dictionary.yaml
poetry run linkml-lint src/linkml_redcap/record/schema/redcap_record.yaml

# Run tests
poetry run pytest

Releases are tag-triggered: push a vX.Y.Z tag matching the version in pyproject.toml and GitHub Actions builds and publishes to PyPI via OIDC trusted publishing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkml_redcap-0.1.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkml_redcap-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file linkml_redcap-0.1.0.tar.gz.

File metadata

  • Download URL: linkml_redcap-0.1.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for linkml_redcap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22d6affee0a252e467434026f2428ee4183771a07a6f36b6427724c0b946423a
MD5 1547050187f5e6ef3782d147871f2e5c
BLAKE2b-256 c42f419563bef537faac53c53c2f64c1a11337569d8f3f62272e969310f36e12

See more details on using hashes here.

Provenance

The following attestation bundles were made for linkml_redcap-0.1.0.tar.gz:

Publisher: pypi-publish.yaml on linkml/linkml-redcap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file linkml_redcap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: linkml_redcap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for linkml_redcap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 268505c8a1784d778023a447e9510c6dc450de1460d3588b2d7ce103861df1ed
MD5 c8cc646f9ad1e4e9013c68b0827f5af0
BLAKE2b-256 044fadd44868eaae9105f56908c6f5fc83befe9ed156bc62f407745f8f46e883

See more details on using hashes here.

Provenance

The following attestation bundles were made for linkml_redcap-0.1.0-py3-none-any.whl:

Publisher: pypi-publish.yaml on linkml/linkml-redcap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page