FHIR Validator and Identifier for resource vs bundle type
Project description
FHIR Validator
Background
While testing Google’s FHIR Store and following the provided documentation, we encountered an issue where the import process wasn’t working as expected. A great tool from MITRE called Synthea generates synthetic patient FHIR records, and it’s even recommended by Google in their examples. However, either due to unclear documentation or our oversight, the import of this generated data failed. After struggling with over 60,000 “invalid JSON” error messages in Google Healthcare, we realized we were missing a crucial content-structure flag. It took us an entire day to figure out the issue.
This got us thinking—what happens when you have an ETL process dealing with hundreds of thousands of files?
We explored existing FHIR validation tools, including those from HL7. However, we found that even for a small 2MB patient file, some validators took up to 6 minutes and produced over 1,000 warnings and errors—most of which were related to external terminologies and content that was valid and parsable by the FHIR store.
This led us to develop a simple validator designed to quickly check if your FHIR files conform to the FHIR R4 schema. The goal is to quickly reject problematic files before they clutter your logs and overwhelm your monitoring systems.
Objective
The objective of fhir-validator is to quickly and efficiently validate FHIR (Fast Healthcare Interoperability Resources) files i against the FHIR schema for structure.
Most validators are rules based delving deep into contents of the FHIR messages, and are often embedded directly into FHIR stores of software used to process FHIR messages and are heavily verbose.
This is meant to be a lightweight fast validation ensure conformity against the FHIR structure.
This script also identifies the FHIR messages content structure used primarily in Google FHIR Store. (e.g., BUNDLE, RESOURCE, BUNDLE_PRETTY, RESOURCE_PRETTY)
Allowing you to determine the appropriate switch for import
Example: CLI validation usage
$ fhir-validator --path data/samples/fhir --action identify
Content structure of data/samples/fhir/practitionerInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/hospitalInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Maricela194_Heidenreich818_9a998c27-9e98-29c2-8878-e214c9cef5ed.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json: BUNDLE_PRETTY
# Performing a google import
$ gcloud healthcare fhir-stores import gcs fhir-store \
--dataset=fhir-dataset \
--gcs-uri=gs://$BUCKET_NAME/*.json \
--content-structure=bundle-pretty
Installation
You can install fhir-validator using either pip or Poetry.
Using pip
pip install fhir-validator
Using Poetry
poetry add fhir-validator
CLI Usage
Once installed, you can use the fhir-validator CLI to validate FHIR files or identify their content structure.
$ fhir-validator --help
usage: fhir-validator [-h] [--path PATH] [--action {validate,identify}] [--chunk-size CHUNK_SIZE]
FHIR Bundle Validator and Content Structure Identifier
optional arguments:
-h, --help show this help message and exit
--path PATH File or directory path to validate or identify content structure
--action {validate,identify}
Action to perform: validate the FHIR bundles or identify the content structure
--chunk-size CHUNK_SIZE
Number of entries per chunk for validation (default: 100)
Validate a FHIR File:
fhir-validator --path path/to/fhir_file.json --action validate
Identify the Content Structure:
fhir-validator --path path/to/fhir_file.json --action identify
This will return
FLAG |
Description |
---|---|
B UNDLE |
The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a bundle, which contains one or more resources. If you don’t specify ContentStructure, it defaults to BUNDLE. |
RES OURCE |
The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a single resource. |
RES OURCE_P RETTY |
The entire source file is one JSON resource. The JSON can span multiple lines. |
B UNDLE_P RETTY |
The entire source file is one JSON bundle. The JSON can span multiple lines. |
Options:
--path: Specify the file or directory path to validate or identify.
--action: Choose validate to validate the file or identify to determine the content structure.
--chunk-size: (Optional) Number of entries per chunk for validation, defaults to 100.
Chunk size
Breaks the file into it’s entry components allowing for faster validation against chunks of the json files.
Integration
You can also use fhir-validator directly in your Python code. Here’s an example of how to integrate the validation or content structure identification into a Python project:
Example: Validate a FHIR File
from fhir_validator import (compile_fhir_schema,
identify_content_structure,
load_consolidated_fhir_schema,
validate_fhir_bundle_in_chunks,
BUNDLE_PRETTY)
import json
file_path = "data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json"
content_structure = identify_content_structure(file_path)
print(f"Content structure: {content_structure}")
# By default loads the r4 schema
schema_json = load_consolidated_fhir_schema('schemas/r4/fhir.schema.json')
compiled_validator = compile_fhir_schema(schema_json)
# If content structure is a bundle, validate it
if content_structure == BUNDLE_PRETTY:
with open(file_path, 'r') as f:
bundle = json.load(f)
is_valid = validate_fhir_bundle_in_chunks(bundle, compiled_validator)
print(f"File : {file_path} is valid ? {is_valid}")
This simple Python snippet demonstrates how to check the content structure of a FHIR file and, if it’s a BUNDLE_PRETTY, how to validate its content.
Development
To contribute to the fhir-validator project, you’ll need to install the necessary dependencies, including the dev and test groups for development tools and testing. The pre-commit hooks are part of the dev group, and pytest is part of the test group.
Setting Up Your Development Environment
Clone the repository:
git clone https://github.com/thevgergroup/fhir-validator.git cd fhir-validator
Install dependencies using Poetry:
Install both the dev and test groups to ensure you have all the necessary tools for development and testing:
poetry install --with dev,test
This command installs the base dependencies along with the dev group (which includes tools like pre-commit) and the test group (which includes tools like pytest).
We use pandoc to generate the README.rst for pypi to ensure links are correctly structured see [Installing Pandoc](https://pandoc.org/installing.html] Update the any necessary changes in README.md and the pre-commit hook will perform the conversion.
Install the Pre-commit Hooks:
The project uses pre-commit to automate tasks such as converting README.md to README.rst before commits. To set up the pre-commit hooks locally, run:
poetry run pre-commit install
This will configure the Git hooks to automatically run when you make a commit.
Tests
We use pytest see the unit tests in tests
poetry run pytest
FHIR Validator
Background
While testing Google’s FHIR Store and following the provided documentation, we encountered an issue where the import process wasn’t working as expected. A great tool from MITRE called Synthea generates synthetic patient FHIR records, and it’s even recommended by Google in their examples. However, either due to unclear documentation or our oversight, the import of this generated data failed. After struggling with over 60,000 “invalid JSON” error messages in Google Healthcare, we realized we were missing a crucial content-structure flag. It took us an entire day to figure out the issue.
This got us thinking—what happens when you have an ETL process dealing with hundreds of thousands of files?
We explored existing FHIR validation tools, including those from HL7. However, we found that even for a small 2MB patient file, some validators took up to 6 minutes and produced over 1,000 warnings and errors—most of which were related to external terminologies and content that was valid and parsable by the FHIR store.
This led us to develop a simple validator designed to quickly check if your FHIR files conform to the FHIR R4 schema. The goal is to quickly reject problematic files before they clutter your logs and overwhelm your monitoring systems.
Objective
The objective of fhir-validator is to quickly and efficiently validate FHIR (Fast Healthcare Interoperability Resources) files i against the FHIR schema for structure.
Most validators are rules based delving deep into contents of the FHIR messages, and are often embedded directly into FHIR stores of software used to process FHIR messages and are heavily verbose.
This is meant to be a lightweight fast validation ensure conformity against the FHIR structure.
This script also identifies the FHIR messages content structure used primarily in Google FHIR Store. (e.g., BUNDLE, RESOURCE, BUNDLE_PRETTY, RESOURCE_PRETTY)
Allowing you to determine the appropriate switch for import
Example: CLI validation usage
$ fhir-validator --path data/samples/fhir --action identify
Content structure of data/samples/fhir/practitionerInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/hospitalInformation1728333795898.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Maricela194_Heidenreich818_9a998c27-9e98-29c2-8878-e214c9cef5ed.json: BUNDLE_PRETTY
Content structure of data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json: BUNDLE_PRETTY
# Performing a google import
$ gcloud healthcare fhir-stores import gcs fhir-store \
--dataset=fhir-dataset \
--gcs-uri=gs://$BUCKET_NAME/*.json \
--content-structure=bundle-pretty
Installation
You can install fhir-validator using either pip or Poetry.
Using pip
pip install fhir-validator
Using Poetry
poetry add fhir-validator
CLI Usage
Once installed, you can use the fhir-validator CLI to validate FHIR files or identify their content structure.
$ fhir-validator --help
usage: fhir-validator [-h] [--path PATH] [--action {validate,identify}] [--chunk-size CHUNK_SIZE]
FHIR Bundle Validator and Content Structure Identifier
optional arguments:
-h, --help show this help message and exit
--path PATH File or directory path to validate or identify content structure
--action {validate,identify}
Action to perform: validate the FHIR bundles or identify the content structure
--chunk-size CHUNK_SIZE
Number of entries per chunk for validation (default: 100)
Validate a FHIR File:
fhir-validator --path path/to/fhir_file.json --action validate
Identify the Content Structure:
fhir-validator --path path/to/fhir_file.json --action identify
This will return
FLAG |
Description |
---|---|
B UNDLE |
The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a bundle, which contains one or more resources. If you don’t specify ContentStructure, it defaults to BUNDLE. |
RES OURCE |
The source file contains one or more lines of newline-delimited JSON (ndjson). Each line is a single resource. |
RES OURCE_P RETTY |
The entire source file is one JSON resource. The JSON can span multiple lines. |
B UNDLE_P RETTY |
The entire source file is one JSON bundle. The JSON can span multiple lines. |
Options:
--path: Specify the file or directory path to validate or identify.
--action: Choose validate to validate the file or identify to determine the content structure.
--chunk-size: (Optional) Number of entries per chunk for validation, defaults to 100.
Chunk size
Breaks the file into it’s entry components allowing for faster validation against chunks of the json files.
Integration
You can also use fhir-validator directly in your Python code. Here’s an example of how to integrate the validation or content structure identification into a Python project:
Example: Validate a FHIR File
from fhir_validator import (compile_fhir_schema,
identify_content_structure,
load_consolidated_fhir_schema,
validate_fhir_bundle_in_chunks,
BUNDLE_PRETTY)
import json
file_path = "data/samples/fhir/Laquanda221_Haag279_84a90023-0c6b-0eb9-95d6-50861e13f9b3.json"
content_structure = identify_content_structure(file_path)
print(f"Content structure: {content_structure}")
# By default loads the r4 schema
schema_json = load_consolidated_fhir_schema('schemas/r4/fhir.schema.json')
compiled_validator = compile_fhir_schema(schema_json)
# If content structure is a bundle, validate it
if content_structure == BUNDLE_PRETTY:
with open(file_path, 'r') as f:
bundle = json.load(f)
is_valid = validate_fhir_bundle_in_chunks(bundle, compiled_validator)
print(f"File : {file_path} is valid ? {is_valid}")
This simple Python snippet demonstrates how to check the content structure of a FHIR file and, if it’s a BUNDLE_PRETTY, how to validate its content.
Development
To contribute to the fhir-validator project, you’ll need to install the necessary dependencies, including the dev and test groups for development tools and testing. The pre-commit hooks are part of the dev group, and pytest is part of the test group.
Setting Up Your Development Environment
Clone the repository:
git clone https://github.com/thevgergroup/fhir-validator.git cd fhir-validator
Install dependencies using Poetry:
Install both the dev and test groups to ensure you have all the necessary tools for development and testing:
poetry install --with dev,test
This command installs the base dependencies along with the dev group (which includes tools like pre-commit) and the test group (which includes tools like pytest).
We use pandoc to generate the README.rst for pypi to ensure links are correctly structured see [Installing Pandoc](https://pandoc.org/installing.html] Update the any necessary changes in README.md and the pre-commit hook will perform the conversion.
Install the Pre-commit Hooks:
The project uses pre-commit to automate tasks such as converting README.md to README.rst before commits. To set up the pre-commit hooks locally, run:
poetry run pre-commit install
This will configure the Git hooks to automatically run when you make a commit.
Tests
We use pytest see the unit tests in tests
poetry run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fhir_validator-0.2.2.tar.gz
.
File metadata
- Download URL: fhir_validator-0.2.2.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 747171da268104cd8f2a874b03fc1c6bc84ab0c1054b758b19e0372114ce0bfa |
|
MD5 | 4e43f3f48858ff7d62e27bada090a140 |
|
BLAKE2b-256 | 3c6642ac644f81553ff229e0aee31eca22c74cfbfd511a69f46d26e4577da015 |
File details
Details for the file fhir_validator-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: fhir_validator-0.2.2-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 805e026e4d9504bb9df34372f32a4b5f326c17f39046979f24b520b830076f52 |
|
MD5 | 7c4f2ddec4197b7c49f1b456047caccb |
|
BLAKE2b-256 | 8a6606c9e7e39f30b1d98bad6b96e181941781819206fd0e62a41fcc9b812a4d |