Data quality and characterization metrics for Cumulus

These details have been verified by PyPI

Project links

Source

Owner

SMART Health IT

GitHub Statistics

Maintainers

mgarberchip

These details have not been verified by PyPI

Project links

Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Project description

Data Metrics

A Cumulus-based implementation of the qualifier metrics.

Implemented Metrics

The following qualifier metrics are implemented (per December 2025 qualifier definitions).

* These are US Core profile-based metrics, and the following profiles are not yet implemented:

Implantable Device (due to the difficulty in identify implantable records)
Some Observation profiles and also its various Vital Signs sub-profiles like Blood Pressure (just haven't gotten around to them yet)

Installing

pip install cumulus-library-data-metrics

Running the Metrics

These metrics are designed as a Cumulus Library study and are run using the cumulus-library command.

Local Ndjson

Let's say you have a collection of FHIR-formatted NDJSON files. They can all be in one folder or in organized subfolders.

Here's a sample command to run against that pile of NDJSON data:

cumulus-library build \
  --db-type duckdb \
  --database output-tables.db \
  --load-ndjson-dir path/to/ndjson/root \
  --target data_metrics

And then you can load output-tables.db in a DuckDB session and see the results. Or read below to export the counts tables.

Visualization

The metrics can also be reviewed in an interactive web interface by installing and running the open source Cumulus Data Metrics Reporting Tool. When generating a metrics file for this view, the output-mode:aggregate flag should be used. E.g.,

cumulus-library build \
--option output-mode:aggregate \
--option min-bucket-size:0 \
--db-type duckdb \
--database src/data/metrics.duckdb \
--target data_metrics \
--load-ndjson-dir {path/to/ndjson/root}

Athena

Here's a sample command to run against your Cumulus data in Athena:

cumulus-library build \
  --database your-glue-database \
  --workgroup your-athena-workgroup \
  --profile your-aws-credentials-profile \
  --target data_metrics

And then you can see the resulting tables in Athena. Or read below to export the counts tables.

Exporting Counts

For the metrics that have exportable counts (the characterization metrics mostly), you can easily export those using Cumulus Library, by replacing build in the above commands with export ./output-folder. Like so:

cumulus-library export \
  ./output-folder \
  --db-type duckdb \
  --database output-tables.db \
  --target data_metrics

Aggregate counts

This study generates CUBE output by default. If it's easier to work with simple aggregate counts of every value combination (that is, without the partial value combinations that CUBE() generates), run the build step with --option output-mode:aggregate.

That is, run it like:

cumulus-library build --option output-mode:aggregate ...

Bucket sizes

To help preserve privacy, this study ignores any count results of less than ten.

For example, if there are only two male patients that died at age 55, that combination of male & 55 will be dropped from the c_pt_deceased_count table.

This makes it easier to share the count results with other institutions. But if that's not a concern and you want the fine-grained details, you can run the build step with --option min-bucket-size:0 to turn this feature off. Or use another value to change the bucket threshold (the default value is 10).

Project details

These details have been verified by PyPI

Project links

Source

Owner

SMART Health IT

GitHub Statistics

Maintainers

mgarberchip

These details have not been verified by PyPI

Project links

Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

10.0.0

Mar 11, 2026

9.0.0

Jan 29, 2026

8.3.0

Jan 2, 2026

8.2.0

Nov 13, 2025

8.1.0

May 13, 2025

8.0.0

Feb 11, 2025

7.0.2

Jan 27, 2025

7.0.1

Jan 15, 2025

7.0.0

Jan 9, 2025

6.0.0

Nov 20, 2024

5.1.0

Oct 14, 2024

5.0.1

Oct 1, 2024

5.0.0

Aug 21, 2024

4.0.2

Jul 23, 2024

4.0.1

Jul 16, 2024

4.0.0

Jul 15, 2024

3.0.0

Jun 10, 2024

2.0.1

May 30, 2024

2.0.0

May 29, 2024

1.0.0

May 16, 2024

0.0.0

May 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cumulus_library_data_metrics-10.0.0.tar.gz (87.0 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cumulus_library_data_metrics-10.0.0-py3-none-any.whl (100.9 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file cumulus_library_data_metrics-10.0.0.tar.gz.

File metadata

Download URL: cumulus_library_data_metrics-10.0.0.tar.gz
Upload date: Mar 11, 2026
Size: 87.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cumulus_library_data_metrics-10.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a94ce7a5875c90a33aea9b34b0a68b23ddbdd4ef264549d11b27b1198bececc2`
MD5	`6724589a8bfaadb5e3621e0cae3a4647`
BLAKE2b-256	`30484760a3d1891f79087ba3080de960d275a6e3a954f456cc948996601df97c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cumulus_library_data_metrics-10.0.0.tar.gz:

Publisher: pypi.yaml on smart-on-fhir/cumulus-library-data-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cumulus_library_data_metrics-10.0.0.tar.gz
- Subject digest: a94ce7a5875c90a33aea9b34b0a68b23ddbdd4ef264549d11b27b1198bececc2
- Sigstore transparency entry: 1084145651
- Sigstore integration time: Mar 11, 2026
Source repository:
- Permalink: smart-on-fhir/cumulus-library-data-metrics@a9131ed6ae98b871e0335a4f8433ab0a70d72bf9
- Branch / Tag: refs/tags/v10.0.0
- Owner: https://github.com/smart-on-fhir
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yaml@a9131ed6ae98b871e0335a4f8433ab0a70d72bf9
- Trigger Event: release

File details

Details for the file cumulus_library_data_metrics-10.0.0-py3-none-any.whl.

File metadata

Download URL: cumulus_library_data_metrics-10.0.0-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 100.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cumulus_library_data_metrics-10.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a79a53f3a3ea0e002fb2b01430e518b072c9dbb50e0e2ff46edce492eeae530d`
MD5	`f62eb957cc65ec7f492458bf2afa1632`
BLAKE2b-256	`fbfcaf000e6a0e60c42c5aa5a4f8b6989ea615887aa32266d1cdfd41bcbfe107`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cumulus_library_data_metrics-10.0.0-py3-none-any.whl:

Publisher: pypi.yaml on smart-on-fhir/cumulus-library-data-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cumulus_library_data_metrics-10.0.0-py3-none-any.whl
- Subject digest: a79a53f3a3ea0e002fb2b01430e518b072c9dbb50e0e2ff46edce492eeae530d
- Sigstore transparency entry: 1084145741
- Sigstore integration time: Mar 11, 2026
Source repository:
- Permalink: smart-on-fhir/cumulus-library-data-metrics@a9131ed6ae98b871e0335a4f8433ab0a70d72bf9
- Branch / Tag: refs/tags/v10.0.0
- Owner: https://github.com/smart-on-fhir
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yaml@a9131ed6ae98b871e0335a4f8433ab0a70d72bf9
- Trigger Event: release

cumulus-library-data-metrics 10.0.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Data Metrics

Implemented Metrics

Installing

Running the Metrics

Local Ndjson

Visualization

Athena

Exporting Counts

Aggregate counts

Bucket sizes

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance