Skip to main content

HealthSage AI LLM - from clinical note to FHIR

Project description

HealthSage AI LLM - from clinical note to FHIR

Introduction

HealthSage AI's LLM is a fine-tuned version of Meta's Llama 2 13B to create structured information - FHIR Resources - from unstructured clinical notes - plain text.

The model is optimized to process English notes and populate 10 FHIR resource types. For a full description of the scope and limitations, see the performance and limitations header below.

Getting started

This repository consists of the following modules:

  • training - all scripts that have been used to train the model.
  • evaluation - specifically validate the FHIR Resources for adherence to the FHIR specification.
  • inference - running inference on the model, either locally or in a containerized environment.
  • demo - a simple demo of the system, using a docker-compose setup.

The easiest way to get started is to run one of the Jupyter Notebooks on Google Colab and other services, e.g. for inference:

  • inference-note-to-fhir-colab-notebook.ipynb

A second step could be to deploy the starter kit for an end to end demo of the system: docker compose -p demo up. A decent GPU is required for this step.

Lastly, you could fine-tune the model on your own data by modifying the training scripts and running them on a GPU.

Published resources

The data sets and models are regularly published to HuggingFace. The inference API is pushed as a Docker image to Docker Hub.

Licensing

The code has been made available under the GNU AGPL 3.0 license. Contact HealthSage AI (hello@healthsage.ai) for commercial licensing options.

Contributing

Contributions are welcome in any form! Please open an issue or a pull request.

Validation

The current version of the Note-to-FHIR model is released as Beta for testing and development purposes only. Its not validated for clinical use.

Performance and limitations

Scope of the model

This open sourced Beta model is trained within the following scope:

  • FHIR R4
  • 10 Resource types:
    1. Bundle
    2. Patient
    3. Encounter
    4. Practitioner
    5. Organization
    6. Immunization
    7. Observation
    8. Condition
    9. AllergyIntolerance
    10. Procedure.
  • English language

The following features are out of scope of the current release:

  • Support for Coding systems such as SNOMED CT and Loinc.
  • FHIR extensions and profiles
  • Any language, resource type or FHIR version not mentioned under "in scope".

We are continuously training our model and will make updates available - that address some of these items and more - on a regular basis.

Furthermore, please note:

  • No Relative dates: HealthSage AI Note-to-FHIR will not provide accurate FHIR datetime fields based on text that contains relative time information like "today" or "yesterday". Furthermore, relative dates like "Patient John Doe is 50 years old." will not result in an accurate birthdate estimation, since the precise birthday and -month is unknown, and since the LLM is not aware of the current date.
  • Designed as Patient-centric: HealthSage AI Note-to-FHIR is trained on notes describing one patient each.
  • <4k Context window: The training data for this application contained at most 3686 tokens, which is 90% of the context window for Llama-2 (4096)
  • Explicit Null: If a certain FHIR element is not present in the provided text, it is explictely predicted as NULL. Explictely modeling the absence of information reduces the chance of hallucinations.
  • Uses Bundles: For consistency and simplicity, all predicted FHIR resources are Bundled.
  • Conservative estimates: Our model is designed to stick to the information explicitely provided in the text.
  • ID's are local: ID fields and references are local enumerations (1,2,3, etc.). They are not yet tested on referential correctness.
  • Generation design: The model is designed to generate a seperate resource if there is information about that resource in the text beyond what can be described in reference fields of related resources.
  • Test results: Our preliminary results suggest that HealthSage AI Note-to-FHIR is superior to the GPT-4 foundation model within the scope of our application in terms of FHIR Syntax and ability to replicate the original FHIR resources in our test dataset. We are currently analyzing our model on its performance for out-of-distribution data and out-of-scope data.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healthsageai_note-to-fhir-0.1.tar.gz (44.8 kB view hashes)

Uploaded Source

Built Distribution

healthsageai_note_to_fhir-0.1-py3-none-any.whl (34.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page