Skip to main content

Remarkably simple testing and validation of AI/NLP applications in healthcare context.

Project description

HealthChain 💫 🏥

HealthChain Logo

GitHub License PyPI Version Python Versions Downloads

Simplify developing, testing and validating AI and NLP applications in a healthcare context 💫 🏥.

Building applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.

pip install healthchain

First time here? Check out our Docs page!

Came here from NHS RPySOC 2024 ✨? CDS sandbox walkthrough

Features

  • 🛠️ Build custom pipelines or use pre-built ones for your healthcare NLP and ML tasks
  • 🏗️ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
  • 🧪 Test your pipelines in full healthcare-context aware sandbox environments
  • 🗃️ Generate synthetic healthcare data for testing and development
  • 🚀 Deploy sandbox servers locally with FastAPI

Why use HealthChain?

  • EHR integrations are manual and time-consuming - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
  • It's difficult to track and evaluate multiple integration instances - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
  • Most healthcare data is unstructured - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.
  • Built by health tech developers, for health tech developers - HealthChain is tech stack agnostic, modular, and easily extensible.

Pipeline

Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.

Building a pipeline

from healthchain.io.containers import Document
from healthchain.pipeline import Pipeline
from healthchain.pipeline.components import TextPreProcessor, SpacyNLP, TextPostProcessor

# Initialize the pipeline
nlp_pipeline = Pipeline[Document]()

# Add TextPreProcessor component
preprocessor = TextPreProcessor(tokenizer="spacy")
nlp_pipeline.add_node(preprocessor)

# Add Model component (assuming we have a pre-trained model)
spacy_nlp = SpacyNLP.from_model_id("en_core_sci_md", source="spacy")
nlp_pipeline.add_node(spacy_nlp)

# Add TextPostProcessor component
postprocessor = TextPostProcessor(
    postcoordination_lookup={
        "heart attack": "myocardial infarction",
        "high blood pressure": "hypertension"
    }
)
nlp_pipeline.add_node(postprocessor)

# Build the pipeline
nlp = nlp_pipeline.build()

# Use the pipeline
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))

print(f"Entities: {result.nlp.spacy_doc.ents}")

Adding connectors

Connectors give your pipelines the ability to interface with EHRs.

from healthchain.io import CdaConnector
from healthchain.models import CdaRequest

cda_connector = CdaConnector()

pipeline.add_input(cda_connector)
pipeline.add_output(cda_connector)

pipe = pipeline.build()

cda_data = CdaRequest(document="<CDA XML content>")
output = pipe(cda_data)

Using pre-built pipelines

Pre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.models import CdaRequest

# Load from model ID
pipeline = MedicalCodingPipeline.from_model_id(
    model="blaze999/Medical-NER", task="token-classification", source="huggingface"
)

# Or load from local model
pipeline = MedicalCodingPipeline.from_local_model("./path/to/model", source="spacy")

cda_data = CdaRequest(document="<CDA XML content>")
output = pipeline(cda_data)

Sandbox

Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.

Clinical Decision Support (CDS)

CDS Hooks is an HL7 published specification for clinical decision support.

When is this used? CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.

What information is sent: the context of the event and FHIR resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.

What information is returned: “cards” displaying text, actionable suggestions, or links to launch a SMART app from within the workflow.

import healthchain as hc

from healthchain.pipeline import SummarizationPipeline
from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.models import Card, CdsFhirData, CDSRequest
from healthchain.data_generator import CdsDataGenerator
from typing import List

@hc.sandbox
class MyCDS(ClinicalDecisionSupport):
    def __init__(self) -> None:
        self.pipeline = SummarizationPipeline.from_model_id(
            "facebook/bart-large-cnn", source="huggingface"
        )
        self.data_generator = CdsDataGenerator()

    # Sets up an instance of a mock EHR client of the specified workflow
    @hc.ehr(workflow="encounter-discharge")
    def ehr_database_client(self) -> CdsFhirData:
        return self.data_generator.generate()

    # Define your application logic here
    @hc.api
    def my_service(self, data: CDSRequest) -> CDSRequest:
        result = self.pipeline(data)
        return result

Clinical Documentation

The ClinicalDocumentation use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.

When is this used? Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.

What information is sent: A CDA (Clinical Document Architecture) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.

import healthchain as hc

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.use_cases import ClinicalDocumentation
from healthchain.models import CcdData, CdaRequest, CdaResponse

@hc.sandbox
class NotereaderSandbox(ClinicalDocumentation):
    def __init__(self):
        self.pipeline = MedicalCodingPipeline.from_model_id(
            "en_core_sci_md", source="spacy"
        )

    # Load an existing CDA file
    @hc.ehr(workflow="sign-note-inpatient")
    def load_data_in_client(self) -> CcdData:
        with open("/path/to/cda/data.xml", "r") as file:
            xml_string = file.read()

        return CcdData(cda_xml=xml_string)

    @hc.api
    def my_service(self, data: CdaRequest) -> CdaResponse:
        annotated_ccd = self.pipeline(data)
        return annotated_ccd

Running a sandbox

Ensure you run the following commands in your mycds.py file:

cds = MyCDS()
cds.run_sandbox()

This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the ./output directory.

Then run:

healthchain run mycds.py

By default, the server runs at http://127.0.0.1:8000, and you can interact with the exposed endpoints at /docs.

Road Map

  • 🎛️ Versioning and artifact management for pipelines sandbox EHR configurations
  • ❓ Testing and evaluation framework for pipelines and use cases
  • 🧠 Multi-modal pipelines that that have built-in NLP to utilize unstructured data
  • ✨ Improvements to synthetic data generator methods
  • 👾 Frontend UI for EHR client and visualization features
  • 🚀 Production deployment options

Contribute

We are always eager to hear feedback and suggestions, especially if you are a developer or researcher working with healthcare systems!

Acknowledgement

This repository makes use of CDS Hooks developed by Boston Children’s Hospital.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healthchain-0.6.0.tar.gz (107.5 kB view details)

Uploaded Source

Built Distribution

healthchain-0.6.0-py3-none-any.whl (151.5 kB view details)

Uploaded Python 3

File details

Details for the file healthchain-0.6.0.tar.gz.

File metadata

  • Download URL: healthchain-0.6.0.tar.gz
  • Upload date:
  • Size: 107.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for healthchain-0.6.0.tar.gz
Algorithm Hash digest
SHA256 9b2a531c2d840a83aecdbada4f9ffeed9e1701efa3c07469e3f1e52ec1e66df7
MD5 5dde16c63c9097105b8f9c105d8ad8b5
BLAKE2b-256 ff6b69e58a03e5131cf9c6d34f8aecf23f44681d733feb0d5aab08d2346f15d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for healthchain-0.6.0.tar.gz:

Publisher: publish.yml on dotimplement/HealthChain

Attestations:

File details

Details for the file healthchain-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: healthchain-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 151.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for healthchain-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f3d4a87bbc5993d44e9acff9c74b9822b206026eca28b8a1856809b274887f
MD5 de041a8778c3d0be273819a5da3f39bd
BLAKE2b-256 dc0aebe4244980497b721730be271c66957c2e11c15d1620f268112c09476f1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for healthchain-0.6.0-py3-none-any.whl:

Publisher: publish.yml on dotimplement/HealthChain

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page