Skip to main content

Remarkably simple testing and validation of AI/NLP applications in healthcare context.

Project description

HealthChain 💫 🏥

HealthChain Logo

GitHub License PyPI Version Python Versions Downloads

Simplify developing, testing and validating AI and NLP applications in a healthcare context 💫 🏥.

Building applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.

pip install healthchain

First time here? Check out our Docs page!

Came here from NHS RPySOC 2024 ✨? CDS sandbox walkthrough

Features

  • 🛠️ Build custom pipelines or use pre-built ones for your healthcare NLP and ML tasks
  • 🏗️ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
  • 🧪 Test your pipelines in full healthcare-context aware sandbox environments
  • 🗃️ Generate synthetic healthcare data for testing and development
  • 🚀 Deploy sandbox servers locally with FastAPI

Why use HealthChain?

  • EHR integrations are manual and time-consuming - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
  • It's difficult to track and evaluate multiple integration instances - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
  • Most healthcare data is unstructured - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.
  • Built by health tech developers, for health tech developers - HealthChain is tech stack agnostic, modular, and easily extensible.

Pipeline

Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.

Building a pipeline

from healthchain.io.containers import Document
from healthchain.pipeline import Pipeline
from healthchain.pipeline.components import TextPreProcessor, SpacyNLP, TextPostProcessor

# Initialize the pipeline
nlp_pipeline = Pipeline[Document]()

# Add TextPreProcessor component
preprocessor = TextPreProcessor(tokenizer="spacy")
nlp_pipeline.add_node(preprocessor)

# Add Model component (assuming we have a pre-trained model)
spacy_nlp = SpacyNLP.from_model_id("en_core_sci_md", source="spacy")
nlp_pipeline.add_node(spacy_nlp)

# Add TextPostProcessor component
postprocessor = TextPostProcessor(
    postcoordination_lookup={
        "heart attack": "myocardial infarction",
        "high blood pressure": "hypertension"
    }
)
nlp_pipeline.add_node(postprocessor)

# Build the pipeline
nlp = nlp_pipeline.build()

# Use the pipeline
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))

print(f"Entities: {result.nlp.spacy_doc.ents}")

Adding connectors

Connectors give your pipelines the ability to interface with EHRs.

from healthchain.io import CdaConnector
from healthchain.models import CdaRequest

cda_connector = CdaConnector()

pipeline.add_input(cda_connector)
pipeline.add_output(cda_connector)

pipe = pipeline.build()

cda_data = CdaRequest(document="<CDA XML content>")
output = pipe(cda_data)

Using pre-built pipelines

Pre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.models import CdaRequest

# Load from model ID
pipeline = MedicalCodingPipeline.from_model_id(
    model="blaze999/Medical-NER", task="token-classification", source="huggingface"
)

# Or load from local model
pipeline = MedicalCodingPipeline.from_local_model("./path/to/model", source="spacy")

cda_data = CdaRequest(document="<CDA XML content>")
output = pipeline(cda_data)

Sandbox

Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.

Clinical Decision Support (CDS)

CDS Hooks is an HL7 published specification for clinical decision support.

When is this used? CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.

What information is sent: the context of the event and FHIR resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.

What information is returned: “cards” displaying text, actionable suggestions, or links to launch a SMART app from within the workflow.

import healthchain as hc

from healthchain.pipeline import SummarizationPipeline
from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.models import Card, CdsFhirData, CDSRequest
from healthchain.data_generator import CdsDataGenerator
from typing import List

@hc.sandbox
class MyCDS(ClinicalDecisionSupport):
    def __init__(self) -> None:
        self.pipeline = SummarizationPipeline.from_model_id(
            "facebook/bart-large-cnn", source="huggingface"
        )
        self.data_generator = CdsDataGenerator()

    # Sets up an instance of a mock EHR client of the specified workflow
    @hc.ehr(workflow="encounter-discharge")
    def ehr_database_client(self) -> CdsFhirData:
        return self.data_generator.generate()

    # Define your application logic here
    @hc.api
    def my_service(self, data: CDSRequest) -> CDSRequest:
        result = self.pipeline(data)
        return result

Clinical Documentation

The ClinicalDocumentation use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.

When is this used? Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.

What information is sent: A CDA (Clinical Document Architecture) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.

import healthchain as hc

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.use_cases import ClinicalDocumentation
from healthchain.models import CcdData, CdaRequest, CdaResponse

@hc.sandbox
class NotereaderSandbox(ClinicalDocumentation):
    def __init__(self):
        self.pipeline = MedicalCodingPipeline.from_model_id(
            "en_core_sci_md", source="spacy"
        )

    # Load an existing CDA file
    @hc.ehr(workflow="sign-note-inpatient")
    def load_data_in_client(self) -> CcdData:
        with open("/path/to/cda/data.xml", "r") as file:
            xml_string = file.read()

        return CcdData(cda_xml=xml_string)

    @hc.api
    def my_service(self, data: CdaRequest) -> CdaResponse:
        annotated_ccd = self.pipeline(data)
        return annotated_ccd

Running a sandbox

Ensure you run the following commands in your mycds.py file:

cds = MyCDS()
cds.run_sandbox()

This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the ./output directory.

Then run:

healthchain run mycds.py

By default, the server runs at http://127.0.0.1:8000, and you can interact with the exposed endpoints at /docs.

Road Map

  • 🎛️ Versioning and artifact management for pipelines sandbox EHR configurations
  • ❓ Testing and evaluation framework for pipelines and use cases
  • 🧠 Multi-modal pipelines that that have built-in NLP to utilize unstructured data
  • ✨ Improvements to synthetic data generator methods
  • 👾 Frontend UI for EHR client and visualization features
  • 🚀 Production deployment options

Contribute

We are always eager to hear feedback and suggestions, especially if you are a developer or researcher working with healthcare systems!

Acknowledgement

This repository makes use of CDS Hooks developed by Boston Children’s Hospital.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healthchain-0.6.1.tar.gz (107.5 kB view details)

Uploaded Source

Built Distribution

healthchain-0.6.1-py3-none-any.whl (151.5 kB view details)

Uploaded Python 3

File details

Details for the file healthchain-0.6.1.tar.gz.

File metadata

  • Download URL: healthchain-0.6.1.tar.gz
  • Upload date:
  • Size: 107.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for healthchain-0.6.1.tar.gz
Algorithm Hash digest
SHA256 2d664066e4205e12e98175e83d4eecce0c685c93e013ee608967121a67b62d4e
MD5 31f96ba0f7a50af7de0bbdad074aba21
BLAKE2b-256 cba123c5fdff971beda78b6a69f76a893ea46ad38dd8c1bacf86773a7150b9cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for healthchain-0.6.1.tar.gz:

Publisher: publish.yml on dotimplement/HealthChain

Attestations:

File details

Details for the file healthchain-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: healthchain-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 151.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for healthchain-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2e6f8a66b8328a17324c4fd4de12e80edc405e861ffe5461306b81b939224b32
MD5 6592dc343640e1623db5594c2ee192a1
BLAKE2b-256 3f81149f317c7814df847d8d33405ef5fc2478b8a447661dab40afbf0fcbcaeb

See more details on using hashes here.

Provenance

The following attestation bundles were made for healthchain-0.6.1-py3-none-any.whl:

Publisher: publish.yml on dotimplement/HealthChain

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page