Skip to main content

Data quality and characterization metrics for Cumulus

Project description

Data Metrics

A Cumulus-based implementation of the qualifier metrics.

Implemented Metrics

The following qualifier metrics are implemented (per June 2024 qualifer definitions).

* These are US Core profile-based metrics, and the following profiles are not yet implemented:

  • Implantable Device (due to the difficulty in identify implantable records)
  • The various Vital Signs sub-profiles like Blood Pressure (just haven't gotten around to them yet)

Installing

pip install cumulus-library-data-metrics

Running the Metrics

These metrics are designed as a Cumulus Library study and are run using the cumulus-library command.

Local Ndjson

First, you'll want to organize your ndjson into the following file tree format:

root/
  condition/
    my-conditions.ndjson
  medicationrequest/
    1.ndjson
    2.ndjson
  patient/
    Patient.ndjson

(This is the same format that Cumulus ETL writes out when using --output-format=ndjson.)

Here's a sample command to run against that pile of ndjson data:

cumulus-library build \
  --db-type duckdb \
  --database output-tables.db \
  --load-ndjson-dir path/to/ndjson/root \
  --target data_metrics

And then you can load output-tables.db in a DuckDB session and see the results. Or read below to export the counts tables.

Athena

Here's a sample command to run against your Cumulus data in Athena:

cumulus-library build \
  --database your-glue-database \
  --workgroup your-athena-workgroup \
  --profile your-aws-credentials-profile \
  --target data_metrics

And then you can see the resulting tables in Athena. Or read below to export the counts tables.

Exporting Counts

For the metrics that have exportable counts (the characterization metrics mostly), you can easily export those using Cumulus Library, by replacing build in the above commands with export ./output-folder. Like so:

cumulus-library export \
  ./output-folder \
  --db-type duckdb \
  --database output-tables.db \
  --target data_metrics

Aggregate counts

This study generates CUBE output by default. If it's easier to work with simple aggregate counts of every value combination (that is, without the partial value combinations that CUBE() generates), run the build step with --option output-mode:aggregate.

That is, run it like:

cumulus-library build --option output-mode:aggregate ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cumulus_library_data_metrics-4.0.2.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file cumulus_library_data_metrics-4.0.2.tar.gz.

File metadata

File hashes

Hashes for cumulus_library_data_metrics-4.0.2.tar.gz
Algorithm Hash digest
SHA256 b1177168c4e466950a15838d475b7c9d96352e07ec3c71505d1a72e1212887df
MD5 e9f8634417f237f309071941a6ca571a
BLAKE2b-256 cca7b874be2a1a72ebcb1630a4b5419465a48023c65a36b7a2e589ac139c21be

See more details on using hashes here.

File details

Details for the file cumulus_library_data_metrics-4.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for cumulus_library_data_metrics-4.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2c42d174997131aee67789cc7c205377cce34416b8d9fa02e20b4be12a173b97
MD5 e35266f2f522ada125603ebd6921f97a
BLAKE2b-256 e177f1df486713a6654719f599aef5f6da220269095dadd6a6b8528b38ba0638

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page