Skip to main content

FHIR Validator and Identifier for resource vs bundle type

Project description

FHIR Validator

Background

While testing Google’s FHIR Store and following the provided documentation, we encountered an issue where the import process wasn’t working as expected. A great tool from MITRE called Synthea generates synthetic patient FHIR records, and it’s even recommended by Google in their examples. However, either due to unclear documentation or our oversight, the import of this generated data failed. After struggling with over 60,000 “invalid JSON” error messages in Google Healthcare, we realized we were missing a crucial content-structure flag. It took us an entire day to figure out the issue.

This got us thinking—what happens when you have an ETL process dealing with hundreds of thousands of files?

We explored existing FHIR validation tools, including those from HL7. However, we found that even for a small 2MB patient file, some validators took up to 6 minutes and produced over 1,000 warnings and errors—most of which were related to external terminologies and content that was valid and parsable by the FHIR store.

This led us to develop a simple validator designed to quickly check if your FHIR files conform to the FHIR R4 schema. The goal is to quickly reject problematic files before they clutter your logs and overwhelm your monitoring systems.

Objective

The objective of fhir-validator is to quickly and efficiently validate FHIR (Fast Healthcare Interoperability Resources) files i against the FHIR schema for structure.

Most validators are rules based delving deep into contents of the FHIR messages, and are often embedded directly into FHIR stores of software used to process FHIR messages and are heavily verbose.

This is meant to be a lightweight fast validation ensure conformity against the FHIR structure.

This script also identifies the FHIR messages content structure used primarily in Google FHIR Store. (e.g., BUNDLE, RESOURCE, BUNDLE_PRETTY, RESOURCE_PRETTY)

Allowing you to determine the appropriate switch for import

Example: CLI validation usage

$ fhir-validator --path data/samples/fhir --action identify
Content structure of data/samples/fhir/practitionerInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/hospitalInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Maricela194_Heidenreich818_9a998c27-9e98-29c2-8878-e214c9cef5ed.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json: BUNDLE_PRETTY

# Performing a google import
$ gcloud healthcare fhir-stores import gcs fhir-store \
    --dataset=fhir-dataset \
    --gcs-uri=gs://$BUCKET_NAME/*.json \
    --content-structure=bundle-pretty

Installation

You can install fhir-validator using either pip or Poetry.

Using pip

pip install fhir-validator

Using Poetry

poetry add fhir-validator

CLI Usage

Once installed, you can use the fhir-validator CLI to validate FHIR files or identify their content structure.

$ fhir-validator --help
usage: fhir-validator [-h] [--path PATH] [--action {validate,identify}] [--chunk-size CHUNK_SIZE]

FHIR Bundle Validator and Content Structure Identifier

optional arguments:
  -h, --help            show this help message and exit
  --path PATH           File or directory path to validate or identify content structure
  --action {validate,identify}
                        Action to perform: validate the FHIR bundles or identify the content structure
  --chunk-size CHUNK_SIZE
                        Number of entries per chunk for validation (default: 100)

Validate a FHIR File:

fhir-validator --path path/to/fhir_file.json --action validate

Identify the Content Structure:

fhir-validator --path path/to/fhir_file.json --action identify

This will return

FLAG

Description

B UNDLE

The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a bundle, which contains one or more resources. If you don’t specify ContentStructure, it defaults to BUNDLE.

RES OURCE

The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a single resource.

RES OURCE_P RETTY

The entire source file is one JSON resource. The JSON can span multiple lines.

B UNDLE_P RETTY

The entire source file is one JSON bundle. The JSON can span multiple lines.

Options:

  • --path: Specify the file or directory path to validate or identify.

  • --action: Choose validate to validate the file or identify to determine the content structure.

  • --chunk-size: (Optional) Number of entries per chunk for validation, defaults to 100.

Chunk size

Breaks the file into it’s entry components allowing for faster validation against chunks of the json files.

Integration

You can also use fhir-validator directly in your Python code. Here’s an example of how to integrate the validation or content structure identification into a Python project:

Example: Validate a FHIR File

from fhir_validator import (compile_fhir_schema,
                            identify_content_structure,
                            load_consolidated_fhir_schema,
                            validate_fhir_bundle_in_chunks,
                            BUNDLE_PRETTY)
import json

file_path = "data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json"
content_structure = identify_content_structure(file_path)

print(f"Content structure: {content_structure}")

# By default loads the r4 schema
schema_json = load_consolidated_fhir_schema('schemas/r4/fhir.schema.json')
compiled_validator = compile_fhir_schema(schema_json)

# If content structure is a bundle, validate it
if content_structure == BUNDLE_PRETTY:
    with open(file_path, 'r') as f:
        bundle = json.load(f)
    is_valid = validate_fhir_bundle_in_chunks(bundle, compiled_validator)
    print(f"File : {file_path} is valid ? {is_valid}")

This simple Python snippet demonstrates how to check the content structure of a FHIR file and, if it’s a BUNDLE_PRETTY, how to validate its content.


Development

To contribute to the fhir-validator project, you’ll need to install the necessary dependencies, including the dev and test groups for development tools and testing. The pre-commit hooks are part of the dev group, and pytest is part of the test group.

Setting Up Your Development Environment

  1. Clone the repository:

    git clone https://github.com/thevgergroup/fhir-validator.git
    cd fhir-validator
  2. Install dependencies using Poetry:

    Install both the dev and test groups to ensure you have all the necessary tools for development and testing:

    poetry install --with dev,test

    This command installs the base dependencies along with the dev group (which includes tools like pre-commit) and the test group (which includes tools like pytest).

    We use pandoc to generate the README.rst for pypi to ensure links are correctly structured see [Installing Pandoc](https://pandoc.org/installing.html] Update the any necessary changes in README.md and the pre-commit hook will perform the conversion.

  3. Install the Pre-commit Hooks:

    The project uses pre-commit to automate tasks such as converting README.md to README.rst before commits. To set up the pre-commit hooks locally, run:

    poetry run pre-commit install

    This will configure the Git hooks to automatically run when you make a commit.

Tests

We use pytest see the unit tests in tests

poetry run pytest

FHIR Validator

Background

While testing Google’s FHIR Store and following the provided documentation, we encountered an issue where the import process wasn’t working as expected. A great tool from MITRE called Synthea generates synthetic patient FHIR records, and it’s even recommended by Google in their examples. However, either due to unclear documentation or our oversight, the import of this generated data failed. After struggling with over 60,000 “invalid JSON” error messages in Google Healthcare, we realized we were missing a crucial content-structure flag. It took us an entire day to figure out the issue.

This got us thinking—what happens when you have an ETL process dealing with hundreds of thousands of files?

We explored existing FHIR validation tools, including those from HL7. However, we found that even for a small 2MB patient file, some validators took up to 6 minutes and produced over 1,000 warnings and errors—most of which were related to external terminologies and content that was valid and parsable by the FHIR store.

This led us to develop a simple validator designed to quickly check if your FHIR files conform to the FHIR R4 schema. The goal is to quickly reject problematic files before they clutter your logs and overwhelm your monitoring systems.

Objective

The objective of fhir-validator is to quickly and efficiently validate FHIR (Fast Healthcare Interoperability Resources) files i against the FHIR schema for structure.

Most validators are rules based delving deep into contents of the FHIR messages, and are often embedded directly into FHIR stores of software used to process FHIR messages and are heavily verbose.

This is meant to be a lightweight fast validation ensure conformity against the FHIR structure.

This script also identifies the FHIR messages content structure used primarily in Google FHIR Store. (e.g., BUNDLE, RESOURCE, BUNDLE_PRETTY, RESOURCE_PRETTY)

Allowing you to determine the appropriate switch for import

Example: CLI validation usage

$ fhir-validator --path data/samples/fhir --action identify
Content structure of data/samples/fhir/practitionerInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/hospitalInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Maricela194_Heidenreich818_9a998c27-9e98-29c2-8878-e214c9cef5ed.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json: BUNDLE_PRETTY

# Performing a google import
$ gcloud healthcare fhir-stores import gcs fhir-store \
    --dataset=fhir-dataset \
    --gcs-uri=gs://$BUCKET_NAME/*.json \
    --content-structure=bundle-pretty

Installation

You can install fhir-validator using either pip or Poetry.

Using pip

pip install fhir-validator

Using Poetry

poetry add fhir-validator

CLI Usage

Once installed, you can use the fhir-validator CLI to validate FHIR files or identify their content structure.

$ fhir-validator --help
usage: fhir-validator [-h] [--path PATH] [--action {validate,identify}] [--chunk-size CHUNK_SIZE]

FHIR Bundle Validator and Content Structure Identifier

optional arguments:
  -h, --help            show this help message and exit
  --path PATH           File or directory path to validate or identify content structure
  --action {validate,identify}
                        Action to perform: validate the FHIR bundles or identify the content structure
  --chunk-size CHUNK_SIZE
                        Number of entries per chunk for validation (default: 100)

Validate a FHIR File:

fhir-validator --path path/to/fhir_file.json --action validate

Identify the Content Structure:

fhir-validator --path path/to/fhir_file.json --action identify

This will return

FLAG

Description

B UNDLE

The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a bundle, which contains one or more resources. If you don’t specify ContentStructure, it defaults to BUNDLE.

RES OURCE

The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a single resource.

RES OURCE_P RETTY

The entire source file is one JSON resource. The JSON can span multiple lines.

B UNDLE_P RETTY

The entire source file is one JSON bundle. The JSON can span multiple lines.

Options:

  • --path: Specify the file or directory path to validate or identify.

  • --action: Choose validate to validate the file or identify to determine the content structure.

  • --chunk-size: (Optional) Number of entries per chunk for validation, defaults to 100.

Chunk size

Breaks the file into it’s entry components allowing for faster validation against chunks of the json files.

Integration

You can also use fhir-validator directly in your Python code. Here’s an example of how to integrate the validation or content structure identification into a Python project:

Example: Validate a FHIR File

from fhir_validator import (compile_fhir_schema,
                            identify_content_structure,
                            load_consolidated_fhir_schema,
                            validate_fhir_bundle_in_chunks,
                            BUNDLE_PRETTY)
import json

file_path = "data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json"
content_structure = identify_content_structure(file_path)

print(f"Content structure: {content_structure}")

# By default loads the r4 schema
schema_json = load_consolidated_fhir_schema('schemas/r4/fhir.schema.json')
compiled_validator = compile_fhir_schema(schema_json)

# If content structure is a bundle, validate it
if content_structure == BUNDLE_PRETTY:
    with open(file_path, 'r') as f:
        bundle = json.load(f)
    is_valid = validate_fhir_bundle_in_chunks(bundle, compiled_validator)
    print(f"File : {file_path} is valid ? {is_valid}")

This simple Python snippet demonstrates how to check the content structure of a FHIR file and, if it’s a BUNDLE_PRETTY, how to validate its content.


Development

To contribute to the fhir-validator project, you’ll need to install the necessary dependencies, including the dev and test groups for development tools and testing. The pre-commit hooks are part of the dev group, and pytest is part of the test group.

Setting Up Your Development Environment

  1. Clone the repository:

    git clone https://github.com/thevgergroup/fhir-validator.git
    cd fhir-validator
  2. Install dependencies using Poetry:

    Install both the dev and test groups to ensure you have all the necessary tools for development and testing:

    poetry install --with dev,test

    This command installs the base dependencies along with the dev group (which includes tools like pre-commit) and the test group (which includes tools like pytest).

    We use pandoc to generate the README.rst for pypi to ensure links are correctly structured see [Installing Pandoc](https://pandoc.org/installing.html] Update the any necessary changes in README.md and the pre-commit hook will perform the conversion.

  3. Install the Pre-commit Hooks:

    The project uses pre-commit to automate tasks such as converting README.md to README.rst before commits. To set up the pre-commit hooks locally, run:

    poetry run pre-commit install

    This will configure the Git hooks to automatically run when you make a commit.

Tests

We use pytest see the unit tests in tests

poetry run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fhir_validator-0.2.2.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

fhir_validator-0.2.2-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file fhir_validator-0.2.2.tar.gz.

File metadata

  • Download URL: fhir_validator-0.2.2.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for fhir_validator-0.2.2.tar.gz
Algorithm Hash digest
SHA256 747171da268104cd8f2a874b03fc1c6bc84ab0c1054b758b19e0372114ce0bfa
MD5 4e43f3f48858ff7d62e27bada090a140
BLAKE2b-256 3c6642ac644f81553ff229e0aee31eca22c74cfbfd511a69f46d26e4577da015

See more details on using hashes here.

Provenance

File details

Details for the file fhir_validator-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for fhir_validator-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 805e026e4d9504bb9df34372f32a4b5f326c17f39046979f24b520b830076f52
MD5 7c4f2ddec4197b7c49f1b456047caccb
BLAKE2b-256 8a6606c9e7e39f30b1d98bad6b96e181941781819206fd0e62a41fcc9b812a4d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page