Skip to main content

An LLM-based clinical information extraction toolkit.

Project description

LLaCIE

PyPI - Version PyPI - Python Versions CI

Large Language (model) Clinical Information Extractor

This is an information extraction pipeline that specializes in running large language models across many clinical notes to abstract new variables.

The task implemented in this initial release is the extraction of presenting signs and symptoms in admission notes for patients with possible infection. This is further detailed in our publication:

  • Pak TR, Kanjilal S, McKenna CS, Hoffner-Heinike A, Rhee C, Klompas M. Syndromic Analysis of Sepsis Cohorts Using Large Language Models. JAMA Netw Open. 2025 Oct 1;8(10):e2539267. doi:10.1001/jamanetworkopen.2025.39267. PMID: 41134571; PMCID: PMC12552932.

The pipeline is designed to be extensible to many tasks. It also allows for the comparison of multiple strategies for each task by evaluating each strategy's performance against a gold standard, e.g., a human-labeled dataset.

Quickstart and demo

Docker is the quickest way to start using this package, because all dependencies (like a Postgres database) can be managed within a single container. If you are new to it, Docker Desktop is likely the easiest way to install Docker. Your Docker environment will need at least 8GB of RAM.

Clone this repo, cd into it, and run the following. This will take several minutes to build and run the container:

$ docker-compose up -d
$ docker-compose exec llacie bash

If this worked, you should now be in a shell within the container with access to the llacie CLI. Run this command to see the main menu, which outlines the basic steps of the pipeline.

$ llacie

To automatically download the Llama model files from HuggingFace, you need to request access to the Llama 3 8B model, create an access token for yourself, and save it into the container.

$ hf auth whoami
$ hf auth login   # If the prior command says, "Not logged in".
                  # If asked to "Add token as git credential?", answer no.

We can now run the example analysis on 100 synthetic admission notes, of which 20 have "gold standard" human-created labels for presenting signs/symptoms. For simplicity, the example uses a quantized version of Llama 3 8B that fits in ~6GB of RAM and runs on CPU only.

$ llacie init-db
$ llacie import-notes text examples/admission-100.txt
$ llacie sections extract -s regex
$ llacie features extract -s llama3_8b
$ llacie episode-labels extract -s pres_sx_eplab2.llama3_8b
$ llacie episode-labels import pres_sx_eplab2 examples/admission-100-labels.xlsx
$ llacie episode-labels evaluate

Installing from PyPI

You can install the package directly from PyPI, which requires Python ≥3.11.

$ pip install llacie

Although this will install some of the Python package dependencies, note that you will need to set up a Postgres database and configure llacie to connect to it.

Configuration

Copy .env.example to .env, and edit the variables within.

The base package runs LLMs using llama-cpp-python on CPU only, but for faster inference, you'll likely want to install vLLM. We don't do this by default because vLLM installation has to be customized to your specific hardware and CUDA version (for NVIDIA GPUs).

Installing a development environment

Using conda

Create or activate a conda environment that includes Python 3.11 and the psycopg2 package, e.g.

$ conda create -n llacie python=3.11 psycopg2  # First time only
$ conda activate llacie                        # Subsequent times
(llacie) $

We develop on this package in a venv (aka virtualenv) within this repository, as this allows the package to be installed in --editable mode, so we can work on it and use it simultaneously.

(llacie) $ python3 -m venv .venv
(llacie) $ . .venv/bin/activate

If that worked, the shell prompt is now also prefixed with (.venv). We next install the repo itself as a local module in this virtualenv. This will also automatically download and install dependencies enumerated in pyproject.toml.

Important: Installing dependencies requires a C/C++ compiler. If this step fails on the MGB Linux cluster, run module load gcc/9.3.0 and try again.

(.venv) (llacie) $ pip install -e .[dev]

If everything worked, you should be able to see the main menu by running:

(.venv) (llacie) $ llacie

Running tests

The test suite is in tests/. Currently, this runs integration tests based on the Quickstart demo, checking the command outputs and that database state is updated appropriately after each step. Common test suite invocations can be run with make:

make test-install
make test           # Runs all of the tests
make test-fast      # Runs only the quicker tests that don't require LLM inference

We automatically run the test suite for every commit pushed to this repo using Github Actions.

Building the package

The package is Python-only and can be built using flit.

$ flit build
$ flit publish

Citation

If you use LLaCIE for your research, please cite our publication:

  • Pak TR, Kanjilal S, McKenna CS, Hoffner-Heinike A, Rhee C, Klompas M. Syndromic Analysis of Sepsis Cohorts Using Large Language Models. JAMA Netw Open. 2025 Oct 1;8(10):e2539267. doi:10.1001/jamanetworkopen.2025.39267. PMID: 41134571; PMCID: PMC12552932.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llacie-1.0.4.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llacie-1.0.4-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file llacie-1.0.4.tar.gz.

File metadata

  • Download URL: llacie-1.0.4.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for llacie-1.0.4.tar.gz
Algorithm Hash digest
SHA256 9488d955ba6e4272ecd9e93c2283a66acda2b760eb5095edb67acf39b990b233
MD5 9c8a1d2e293214dd511a1ca4865d5b63
BLAKE2b-256 1837ab3d477a25d5db22de582f6198be11d29e94400dd92302836b42df04580a

See more details on using hashes here.

File details

Details for the file llacie-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: llacie-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for llacie-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2788c5dc36cf2bfa06d59bcac3e52e0ca25b989b445661e328162a8f5fe1d423
MD5 e39aca103f99d0885f15210dc7aaedec
BLAKE2b-256 58d19e7305df38e1336342b30f3d0b091381a34f64fd05f86ab0ecf5a0a7cbeb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page