Builds sample MEDS datasets for testing.

These details have been verified by PyPI

Project links

Issues

GitHub Statistics

Maintainers

mmd_pypi

These details have not been verified by PyPI

Project links

Homepage

Framework
- Pytest
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

MEDS Testing Helpers

Provides various utilities for testing and benchmarking MEDS packages and tools, including pytest helpers, fixtures, sample datasets, and capabilities to build larger sample datasets for benchmarking purposes.

Installation

pip install meds_testing_helpers

Testing Helpers

After installing this package via pip, you can use the provided pytest fixtures and helpers to test your MEDS pipelines and tools. These include fixtures for static datasets shipped with this package and fixtures that generate larger datasets on the fly for benchmarking purposes.

Simple Static Dataset

You can use the fixture simple_static_MEDS to access a simple, static dataset that is sharded by split (e.g., shard names are of the form train/0.parquet. To use this fixture, simply add it as an argument to your test function in pytest:

# test_my_pipeline.py


def test_my_pipeline(simple_static_MEDS):
    # The simple static dataset will be stored on disk in a temporary directory in a path given by
    # the `simple_static_MEDS` input variable.
    pass

Note that you can also import this static dataset directly in yaml form then convert it to a MEDS dataset in a simple object-oriented format that can be written to disk via:

from meds_testing_helpers.static_sample_data import SIMPLE_STATIC_SHARDED_BY_SPLIT
from meds_testing_helpers.dataset import MEDSDataset

data = MEDSDataset.from_yaml(SIMPLE_STATIC_SHARDED_BY_SPLIT)
data.write(...)

Simple Static Dataset with Tasks

You can use the fixture simple_static_MEDS_with_task to access a dataset that is identical to the simple_static_MEDS dataset, but augmented with a prediction task named boolean_value_task that has a boolean label. Note that this formulation of including tasks relies on file storage conventions that are not mandated within MEDS; namely that tasks are stored in a task_labels subdirectory of the raw dataset directory.

Generated Datasets (useful for benchmarking)

You can use the fixture generated_sample_MEDS to generate a sample dataset that is similar to the static dataset discussed above dynamically with a controllable number of patients (controlled via the pytest argument --generated-dataset-N). This dataset is generated on the fly and is not stored on disk, so will take some time to generate depending on the number of patients. The dataset is generated according to the relevant configuration file. Over time, more configs and data generation specifications will be added. Like the static datasets, as a pytest fixture this can be accessed via a temporary path:

# test_my_pipeline.py


def test_my_pipeline(generated_sample_MEDS):
    # The generated dataset will be stored on disk in a temporary directory in a path given by
    # the `generated_sample_MEDS` input variable.
    pass

You can also control the seed of the generation process via the pytest argument --generated-dataset-seed.

Building Sample Datasets

This package also contains an executable to generate sample MEDS datasets and store them to disk. It is this command that backs the generated_sample_MEDS pytest fixtures. This CLI tool uses hydra to manage configuration options and generate datasets according to the configuration. You can run the command as follows:

build_sample_MEDS_dataset dataset_spec=sample N_subjects=500

Add do_overwrite=True to overwrite an existing dataset. You can see the full configuration options by running sample_MEDS --help.

Inferring Dataset Generation Configs

To generate a dataset similar to a local dataset on disk, you can also use the infer_MEDS_sample_gen_config CLI command to infer the dataset generation configuration from a local dataset. This command will output a yaml file to a specified path on disk that can be used as input to the build_sample_MEDS_dataset command to generate a dataset similar to the local dataset along some limited axes. There is not a clean way currently to use the yaml file on disk other than making a new dataset_spec configuration file and referencing that directly via Hydra.

infer_MEDS_sample_gen_config dataset_dir=/path/to/local/dataset output_fp=/path/to/output.yaml

Project details

These details have been verified by PyPI

Project links

Issues

GitHub Statistics

Maintainers

mmd_pypi

These details have not been verified by PyPI

Project links

Homepage

Framework
- Pytest
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.3.1

Nov 5, 2025

This version

0.3.0

May 6, 2025

0.2.7

Apr 17, 2025

0.2.6

Mar 28, 2025

0.2.5

Mar 28, 2025

0.2.4

Mar 16, 2025

0.2.3

Mar 3, 2025

0.2.2

Feb 28, 2025

0.2.1

Feb 28, 2025

0.0.1

Feb 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meds_testing_helpers-0.3.0.tar.gz (40.6 kB view details)

Uploaded May 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

meds_testing_helpers-0.3.0-py3-none-any.whl (36.7 kB view details)

Uploaded May 6, 2025 Python 3

File details

Details for the file meds_testing_helpers-0.3.0.tar.gz.

File metadata

Download URL: meds_testing_helpers-0.3.0.tar.gz
Upload date: May 6, 2025
Size: 40.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for meds_testing_helpers-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`d53bfa6b1fd0263973369a2ab1412a481ea361b2ae07b768584dcea26de8893e`
MD5	`00a9345931b707c4eb2e94fa9e708fc5`
BLAKE2b-256	`55a8aebe0a1dd3396b0d21a74c4eb6cd00c5bdfef9be08b2414220e3fac889f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for meds_testing_helpers-0.3.0.tar.gz:

Publisher: python-build.yaml on Medical-Event-Data-Standard/meds_testing_helpers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: meds_testing_helpers-0.3.0.tar.gz
- Subject digest: d53bfa6b1fd0263973369a2ab1412a481ea361b2ae07b768584dcea26de8893e
- Sigstore transparency entry: 207434075
- Sigstore integration time: May 6, 2025
Source repository:
- Permalink: Medical-Event-Data-Standard/meds_testing_helpers@65bce80229e871aa38b40cbbee23dbac160fe4ec
- Branch / Tag: refs/tags/0.3.0
- Owner: https://github.com/Medical-Event-Data-Standard
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-build.yaml@65bce80229e871aa38b40cbbee23dbac160fe4ec
- Trigger Event: push

File details

Details for the file meds_testing_helpers-0.3.0-py3-none-any.whl.

File metadata

Download URL: meds_testing_helpers-0.3.0-py3-none-any.whl
Upload date: May 6, 2025
Size: 36.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for meds_testing_helpers-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c58257605df940f582532b4cb3f9cc88e99aa7af232f45ec48eff4313216a47f`
MD5	`dcba225df85d6f6aa2b92323eadde32d`
BLAKE2b-256	`8bc32042d97b5a182ef53386d62d201ee0d09eb375023affba2c6c55f2d40afa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for meds_testing_helpers-0.3.0-py3-none-any.whl:

Publisher: python-build.yaml on Medical-Event-Data-Standard/meds_testing_helpers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: meds_testing_helpers-0.3.0-py3-none-any.whl
- Subject digest: c58257605df940f582532b4cb3f9cc88e99aa7af232f45ec48eff4313216a47f
- Sigstore transparency entry: 207434080
- Sigstore integration time: May 6, 2025
Source repository:
- Permalink: Medical-Event-Data-Standard/meds_testing_helpers@65bce80229e871aa38b40cbbee23dbac160fe4ec
- Branch / Tag: refs/tags/0.3.0
- Owner: https://github.com/Medical-Event-Data-Standard
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-build.yaml@65bce80229e871aa38b40cbbee23dbac160fe4ec
- Trigger Event: push

meds-testing-helpers 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

MEDS Testing Helpers

Installation

Testing Helpers

Simple Static Dataset

Simple Static Dataset with Tasks

Generated Datasets (useful for benchmarking)

Building Sample Datasets

Inferring Dataset Generation Configs

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance