Skip to main content

Data quality and characterization metrics for Cumulus

Project description

Data Metrics

A Cumulus-based implementation of the qualifier metrics.

Implemented Metrics

The following qualifier metrics are implemented (per June 2024 qualifer definitions).

* These are US Core profile-based metrics, and the following profiles are not yet implemented:

  • Implantable Device (due to the difficulty in identify implantable records)
  • The various Vital Signs sub-profiles like Blood Pressure (just haven't gotten around to them yet)

Installing

pip install cumulus-library-data-metrics

Running the Metrics

These metrics are designed as a Cumulus Library study and are run using the cumulus-library command.

Local Ndjson

First, you'll want to organize your ndjson into the following file tree format:

root/
  condition/
    my-conditions.ndjson
  medicationrequest/
    1.ndjson
    2.ndjson
  patient/
    Patient.ndjson

(This is the same format that Cumulus ETL writes out when using --output-format=ndjson.)

Here's a sample command to run against that pile of ndjson data:

cumulus-library build \
  --db-type duckdb \
  --database output-tables.db \
  --load-ndjson-dir path/to/ndjson/root \
  --target data_metrics

And then you can load output-tables.db in a DuckDB session and see the results. Or read below to export the counts tables.

Athena

Here's a sample command to run against your Cumulus data in Athena:

cumulus-library build \
  --database your-glue-database \
  --workgroup your-athena-workgroup \
  --profile your-aws-credentials-profile \
  --target data_metrics

And then you can see the resulting tables in Athena. Or read below to export the counts tables.

Exporting Counts

For the metrics that have exportable counts (the characterization metrics mostly), you can easily export those using Cumulus Library, by replacing build in the above commands with export ./output-folder. Like so:

cumulus-library export \
  ./output-folder \
  --db-type duckdb \
  --database output-tables.db \
  --target data_metrics

Aggregate counts

This study generates CUBE output by default. If it's easier to work with simple aggregate counts of every value combination (that is, without the partial value combinations that CUBE() generates), run the build step with --option output-mode:aggregate.

That is, run it like:

cumulus-library build --option output-mode:aggregate ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cumulus_library_data_metrics-6.0.0.tar.gz (65.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cumulus_library_data_metrics-6.0.0-py3-none-any.whl (77.9 kB view details)

Uploaded Python 3

File details

Details for the file cumulus_library_data_metrics-6.0.0.tar.gz.

File metadata

File hashes

Hashes for cumulus_library_data_metrics-6.0.0.tar.gz
Algorithm Hash digest
SHA256 bb9e580c697d31265fce5ff978c8e4932bab7a1c167e9b54637e22a28f1b83fe
MD5 273dd675cea38fdb9b9c56f7e9725650
BLAKE2b-256 7febbd973a62fa031e57e42d5a07e1e7fea66ff9d55e76a2fa007c7bba21df8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cumulus_library_data_metrics-6.0.0.tar.gz:

Publisher: pypi.yaml on smart-on-fhir/cumulus-library-data-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cumulus_library_data_metrics-6.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cumulus_library_data_metrics-6.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e4b121e2ece86cfa596a6af1ea63d0e60d9062e2947b77bd591f7f471895989
MD5 fd5348503e742aae88d4f44c29de92eb
BLAKE2b-256 fe2acd055590c173f5f3eb6250d9c5910405adae48518cf2c3c841470e4b6cfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cumulus_library_data_metrics-6.0.0-py3-none-any.whl:

Publisher: pypi.yaml on smart-on-fhir/cumulus-library-data-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page