Tools to create views of FHIR data for analysis.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

FHIR Views

Introduction

FHIR Views is a way to define simple, tabular views over complex FHIR data and turn them into queries that use SQL on FHIR conventions, or other data sources in the future. It is installed as part of a simple pip install google-fhir-views[r4,bigquery] command.

FHIR Views has two main concepts:

A view definition, which defines the fields and criteria created by a view. It provides a Python API for convenience, but ultimately a view definition is a set of FHIRPath expressions that we'll explore below.
A view runner, which creates that view over some data source.

For example, let's create a simple view of patient resources for patients born before 1960:

import datetime
from google.fhir.views import bigquery_runner, r4

# Load views based on the base FHIR R4 profile definitions.
views = r4.base_r4()

# Creates a view using the base patient profile.
pats = views.view_of('Patient')

# In this case we interpret the 'current' address as one where period is empty.
# This can be adjusted to meet the needs of a specific dataset.
current = pats.address.where(pats.address.period.empty()).first()

simple_pats = pats.select([
    pats.id.alias('id'),
    pats.gender.alias('gender'),
    pats.birthDate.alias('birthdate'),
    current.line.first().alias('street'),
    current.city.alias('city'),
    current.state.alias('state'),
    current.postalCode.alias('zip')
    ]).where(
       pats.birthDate < datetime.date(1960,1,1))

With support for SQL on FHIR v2, the above view can also be defined as:

simple_pats_config = {
    'resource': 'Patient',
    'select': [
        {
            'alias': 'id',
            'path': 'id',
        },
        {
            'alias': 'gender',
            'path': 'gender',
        },
        {
            'alias': 'birthDate',
            'path': 'birthDate',
        },
        {
            'alias': 'street',
            'path': 'address.where(address.period.empty()).first().line.first()',
        },
        {
            'alias': 'city',
            'path': 'address.where(address.period.empty()).first().city',
        },
        {
            'alias': 'state',
            'path': 'address.where(address.period.empty()).first().state',
        },
        {
            'alias': 'zip',
            'path': 'address.where(address.period.empty()).first().postalCode',
        },
    ],
    'where': [
        {'path': 'birthDate < @1960-01-01'},
    ],
}

views = r4.base_r4()
simple_pats = views.from_view_defination(simple_pats_config)

If you run the above in a Jupyter notebook or similar tool, you'll notice that the view builder supports tab suggestions that matches the fields in the FHIR resource of the given profile. In fact, this is just a Pythonic way to build FHIRPath expressions to be used by the runner, with suggestions available by just pressing tab:

tab suggestion image

That builder is convenient for Python users, but you can also see the FHIRPath expression themselves by just getting the string representation of the view, such as by running print(simple_pats). Notice every column and the 'where' criteria are defined by FHIRPath expressions, while every column must define its name in the alias() function appended to the FHIRPath:

View<http://hl7.org/fhir/StructureDefinition/Patient.select(
  id.alias(id),
  gender.alias(gender),
  birthDate.alias(birthdate),
  address.where(period.empty()).first().line.first().alias(street),
  address.where(period.empty()).first().city.alias(city),
  address.where(period.empty()).first().state.alias(state),
  address.where(period.empty()).first().postalCode.alias(zip)
).where(
  birthDate < @1960-01-01
)>

In other words, any runner implementation would basically use the FHIRPath expressions to select and filter the underlying data. The example below will use a BigQuery runner, which translates FHIRPath expressions into SQL, but runners in Apache Spark and directly on JSON will follow. This could also be exported as a simple JSON structure and passed to remote services to evaluate the FHIRPath expressions and produce a view for the user.

Now that we've defined a view, let's run it against a real dataset. We'll run this over BigQuery:

# Get a BigQuery client. This may require additional authentication to access
# BigQuery, depending on your notebook environment. Typically the client
# and runner are created only once at the start of a notebook.
from google.cloud import bigquery as bq
client = bq.Client()
runner = bigquery_runner.BigQueryRunner(
    client,
    fhir_dataset='bigquery-public-data.fhir_synthea',
    snake_case_resource_tables=True)

runner.to_dataframe(simple_pats, limit = 5)

Which produces this table:

	id	gender	birthdate	street	city	state	zip
0	6759d2b7-38b4-4798-97c0-d171a53e013a	male	1916-03-21	659 Bayer Wall Apt 61	Boston	Massachusetts	02108
1	41dbee4d-d355-413f-a040-93ca037fe646	male	1951-12-05	226 Sipes Ranch Unit 37	Lynnfield	Massachusetts	01940
2	e194d708-8989-4e0c-a8e1-eda7351672ce	male	1947-09-24	638 Pouros Wall Suite 52	Lynnfield	Massachusetts	01940
3	4bccdc85-c040-45dd-ada3-a55064439a01	male	1943-06-20	825 Jakubowski Extension	Tewksbury	Massachusetts	01876
4	8dca4c3c-d2d5-460f-9168-5f18e5d29b2b	male	1945-12-13	319 Cronin Light	Hubbardston	Massachusetts	01452

That's it! Now the returned dataframe contains a table of the example patients described in the query, pulled from the FHIR data stored in BigQuery. Examples below will show more sophisticated use cases such as turning a FHIR view into a BigQuery virtual view or incorporating clinical content from code value sets.

At this time we support a BigQuery runner to consume FHIR data in BigQuery as our data source, but future runners may support other data stores, FHIR servers, or FHIR bulk extracts on disk.

Working with code values

Most meaningful analysis of healthcare data involves navigating clinical terminologies. In some cases these value sets come from an established authority like the Value Set Authority Center, and other times they are defined and maintained locally for custom use cases.

FHIR Views offers a convenient mechanism to create and use such value sets in your queries. Here is an example that defines a collection of LOINC codes indicating LDL results:

LDL_TEST = r4.value_set('urn:example:value_set:ldl').with_codes(
    'http://loinc.org', ['18262-6', '18261-8', '12773-8']).build()

Now we can easily query observations with a view that uses the FHIRPath memberOf function:

# Creates the base observation view for convenience, typically done once per
# base type in a notebook.
obs = views.view_of('Observation')

ldl_obs = obs.select([
    obs.subject.idFor('Patient').alias('patient'),
    # Below is a Pythonic shorthand -- users could type
    # `obs.value.ofType('Quantity').value` instead for the FHIRPath ofType
    # expression, but the shorthand helps autocompletion
    obs.valueQuantity.value.alias('value'),
    obs.valueQuantity.unit.alias('unit'),
    obs.code.coding.display.first().alias('test'),
    obs.effectiveDateTime.alias('effectiveTime')
    ]).where(obs.code.memberOf(LDL_TEST))

runner.to_dataframe(ldl_obs, limit=5)

	patient	value	unit	test	effectiveTime
0	903156da-ca5d-4ec3-ad36-073a9437afe4	153.058	mg/dL	Low Density Lipoprotein Cholesterol	2014-06-20 11:30:15+00:00
1	3d268dce-fed4-4bc7-b156-c78e810c5183	149.379	mg/dL	Low Density Lipoprotein Cholesterol	2013-06-10 16:20:36+00:00
2	fdf7c87b-1c8f-4d09-8d51-e622f747a7c8	88.047	mg/dL	Low Density Lipoprotein Cholesterol	2013-10-07 00:08:45+00:00
3	9007c0ff-a0ad-48dc-adc2-c0908c06fba8	108.145	mg/dL	Low Density Lipoprotein Cholesterol	2016-03-18 10:31:54+00:00
4	cc5a2dd6-37b6-4f15-9da7-53f3b85e3370	64.5849	mg/dL	Low Density Lipoprotein Cholesterol	2012-05-26 13:27:46+00:00

Working with external value sets and terminology services

You can also work with value sets defined by external terminology services. To do so, you must first create a terminology service client.

This example uses the UMLS terminology service from the NIH. In order access this terminology service, you need to sign up here. You should then enter the API key found on your profile page in the place of 'your-umls-api-key' below.

from google.fhir.r4.terminology import terminology_service_client

tx_client = terminology_service_client.TerminologyServiceClient({
    'http://cts.nlm.nih.gov/fhir/': ('apikey', 'your-umls-api-key'),
})

Before making queries against an externally-defined value set, you must first get the codes defined by the value set and write them to a BigQuery table. You only need to perform this step once. After doing so, you'll be able to reference the value set definitions you've written in future queries.

injury_value_set_url = 'http://cts.nlm.nih.gov/fhir/ValueSet/2.16.840.1.113762.1.4.1029.5'
wound_disorder_value_set_url = 'http://cts.nlm.nih.gov/fhir/ValueSet/2.16.840.1.113762.1.4.1219.178'
runner.materialize_value_set_expansion((injury_value_set_url, wound_disorder_value_set_url), tx_client)

To make queries against an externally-defined value set which you've saved to BigQuery, you can simply refer to its URL.

injury_conds =  cond.select([
    cond.id.alias('id'),
    cond.subject.idFor('Patient').alias('patientId'),
    cond.code.alias('codes')
    ]).where(cond.code.memberOf(injury_value_set_url))

runner.create_database_view(injury_conds, 'injury_conditions')

Saving FHIR Views as BigQuery Views

While runner.to_dataframe is convenient to retrieve data for local analysis, it's often useful to create such flattened views in BigQuery itself. They can be easily queried with much simpler SQL, or used by a variety of business intelligence or other data analysis tools.

For this reason, the BigQueryRunner offers a create_database_view method that will convert the view definition into a BigQuery View, which can then just be consumed as if it was a first-class table that is updated when the underlying data is updated. Here's an example:

runner.create_database_view(ldl_obs, 'ldl_observations')

By default the view is created in the fhir_dataset used by the runner, but this isn't always desirable (for example, a user may want to do their analysis in their own, isolated dataset). Therefore it's common to specify a view_dataset when creating the runner as the target for any views created. Here's an example:

runner = bigquery_runner.BigQueryRunner(
    client,
    fhir_dataset='bigquery-public-data.fhir_synthea',
    view_dataset='example_project.diabetic_care_example')

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.10.0

Sep 22, 2023

0.9.3

Jun 5, 2023

0.9.2

Apr 17, 2023

0.9.1

Apr 4, 2023

0.9.0

Mar 31, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

google_fhir_views-0.10.0-py3-none-any.whl (128.7 kB view hashes)

Uploaded Sep 22, 2023 Python 3

Hashes for google_fhir_views-0.10.0-py3-none-any.whl

Hashes for google_fhir_views-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df335de8d23d8682f04e48a8ada3d71ac2a24555ef52050143839119fd18b399`
MD5	`1857bc39923f66ea354857612020f18f`
BLAKE2b-256	`f51d551a033a2aae3187a543118148bbc0ab96a35b1d87e5179be910a53b9103`