Data quality and characterization metrics for Cumulus
Project description
Data Metrics
A Cumulus-based implementation of the qualifier metrics.
Implemented Metrics
The following qualifier metrics are implemented (per June 2024 qualifer definitions).
- c_pt_count
- c_pt_deceased_count
- c_resource_count
- c_resources_per_pt
- c_system_use
- c_us_core_v4_count *
- q_date_recent
- q_ref_target_pop
- q_ref_target_valid
- q_system_use
- q_valid_us_core_v4 *
* These are US Core profile-based metrics, and the following profiles are not yet implemented:
- Implantable Device (due to the difficulty in identify implantable records)
- The various Vital Signs sub-profiles like Blood Pressure (just haven't gotten around to them yet)
Installing
pip install cumulus-library-data-metrics
Running the Metrics
These metrics are designed as a
Cumulus Library
study and are run using the cumulus-library
command.
Local Ndjson
First, you'll want to organize your ndjson into the following file tree format:
root/
condition/
my-conditions.ndjson
medicationrequest/
1.ndjson
2.ndjson
patient/
Patient.ndjson
(This is the same format that Cumulus ETL writes out when using --output-format=ndjson
.)
Here's a sample command to run against that pile of ndjson data:
cumulus-library build \
--db-type duckdb \
--database output-tables.db \
--load-ndjson-dir path/to/ndjson/root \
--target data_metrics
And then you can load output-tables.db
in a DuckDB session and see the results.
Or read below to export the counts tables.
Athena
Here's a sample command to run against your Cumulus data in Athena:
cumulus-library build \
--database your-glue-database \
--workgroup your-athena-workgroup \
--profile your-aws-credentials-profile \
--target data_metrics
And then you can see the resulting tables in Athena. Or read below to export the counts tables.
Exporting Counts
For the metrics that have exportable counts (the characterization metrics mostly),
you can easily export those using Cumulus Library,
by replacing build
in the above commands with export ./output-folder
.
Like so:
cumulus-library export \
./output-folder \
--db-type duckdb \
--database output-tables.db \
--target data_metrics
Aggregate counts
This study generates CUBE
output by default.
If it's easier to work with simple aggregate counts of every value combination
(that is, without the partial value combinations that CUBE()
generates),
run the build step with --option output-mode:aggregate
.
That is, run it like:
cumulus-library build --option output-mode:aggregate ...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cumulus_library_data_metrics-4.0.2.tar.gz
.
File metadata
- Download URL: cumulus_library_data_metrics-4.0.2.tar.gz
- Upload date:
- Size: 64.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1177168c4e466950a15838d475b7c9d96352e07ec3c71505d1a72e1212887df |
|
MD5 | e9f8634417f237f309071941a6ca571a |
|
BLAKE2b-256 | cca7b874be2a1a72ebcb1630a4b5419465a48023c65a36b7a2e589ac139c21be |
File details
Details for the file cumulus_library_data_metrics-4.0.2-py3-none-any.whl
.
File metadata
- Download URL: cumulus_library_data_metrics-4.0.2-py3-none-any.whl
- Upload date:
- Size: 76.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c42d174997131aee67789cc7c205377cce34416b8d9fa02e20b4be12a173b97 |
|
MD5 | e35266f2f522ada125603ebd6921f97a |
|
BLAKE2b-256 | e177f1df486713a6654719f599aef5f6da220269095dadd6a6b8528b38ba0638 |