An LLM-based clinical information extraction toolkit.

These details have not been verified by PyPI

Project description

LLaCIE

Large Language (model) Clinical Information Extractor

This is an information extraction pipeline that specializes in running large language models across many clinical notes to abstract new variables.

The task implemented in this initial release is the extraction of presenting signs and symptoms in admission notes for patients with possible infection. This is further detailed in our publication:

Pak TR, Kanjilal S, McKenna CS, Hoffner-Heinike A, Rhee C, Klompas M. Syndromic Analysis of Sepsis Cohorts Using Large Language Models. JAMA Netw Open. 2025 Oct 1;8(10):e2539267. doi:10.1001/jamanetworkopen.2025.39267. PMID: 41134571; PMCID: PMC12552932.

The pipeline is designed to be extensible to many tasks. It also allows for the comparison of multiple strategies for each task by evaluating each strategy's performance against a gold standard, e.g., a human-labeled dataset.

Quickstart and demo

Docker is the quickest way to start using this package, because all dependencies (like a Postgres database) can be managed within a single container. If you are new to it, Docker Desktop is likely the easiest way to install Docker. Your Docker environment will need at least 8GB of RAM.

Clone this repo, cd into it, and run the following. This will take several minutes to build and run the container:

$ docker-compose up -d
$ docker-compose exec llacie bash

If this worked, you should now be in a shell within the container with access to the llacie CLI. Run this command to see the main menu, which outlines the basic steps of the pipeline.

$ llacie

To automatically download the Llama model files from HuggingFace, you need to request access to the Llama 3 8B model, create an access token for yourself, and save it into the container.

$ hf auth whoami
$ hf auth login   # If the prior command says, "Not logged in".
                  # If asked to "Add token as git credential?", answer no.

We can now run the example analysis on 100 synthetic admission notes, of which 20 have "gold standard" human-created labels for presenting signs/symptoms. For simplicity, the example uses a quantized version of Llama 3 8B that fits in ~6GB of RAM and runs on CPU only.

$ llacie init-db
$ llacie import-notes text examples/admission-100.txt
$ llacie sections extract -s regex
$ llacie features extract -s llama3_8b
$ llacie episode-labels extract -s pres_sx_eplab2.llama3_8b
$ llacie episode-labels import pres_sx_eplab2 examples/admission-100-labels.xlsx
$ llacie episode-labels evaluate

Installing from PyPI

You can install the package directly from PyPI, which requires Python ≥3.11.

$ pip install llacie

Although this will install some of the Python package dependencies, note that you will need to set up a Postgres database and configure llacie to connect to it.

Configuration

Copy .env.example to .env, and edit the variables within.

The base package runs LLMs using llama-cpp-python on CPU only, but for faster inference, you'll likely want to install vLLM. We don't do this by default because vLLM installation has to be customized to your specific hardware and CUDA version (for NVIDIA GPUs).

Installing a development environment

Using conda

Create or activate a conda environment that includes Python 3.11 and the psycopg2 package, e.g.

$ conda create -n llacie python=3.11 psycopg2  # First time only
$ conda activate llacie                        # Subsequent times
(llacie) $

We develop on this package in a venv (aka virtualenv) within this repository, as this allows the package to be installed in --editable mode, so we can work on it and use it simultaneously.

(llacie) $ python3 -m venv .venv
(llacie) $ . .venv/bin/activate

If that worked, the shell prompt is now also prefixed with (.venv). We next install the repo itself as a local module in this virtualenv. This will also automatically download and install dependencies enumerated in pyproject.toml.

Important: Installing dependencies requires a C/C++ compiler. If this step fails on the MGB Linux cluster, run module load gcc/9.3.0 and try again.

(.venv) (llacie) $ pip install -e .[dev]

If everything worked, you should be able to see the main menu by running:

(.venv) (llacie) $ llacie

Running tests

The test suite is in tests/. Currently, this runs integration tests based on the Quickstart demo, checking the command outputs and that database state is updated appropriately after each step. Common test suite invocations can be run with make:

make test-install
make test           # Runs all of the tests
make test-fast      # Runs only the quicker tests that don't require LLM inference

We automatically run the test suite for every commit pushed to this repo using Github Actions.

Building the package

The package is Python-only and can be built using flit.

$ flit build
$ flit publish

Citation

If you use LLaCIE for your research, please cite our publication:

Pak TR, Kanjilal S, McKenna CS, Hoffner-Heinike A, Rhee C, Klompas M. Syndromic Analysis of Sepsis Cohorts Using Large Language Models. JAMA Netw Open. 2025 Oct 1;8(10):e2539267. doi:10.1001/jamanetworkopen.2025.39267. PMID: 41134571; PMCID: PMC12552932.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.4

Dec 17, 2025

1.0.3

Nov 19, 2025

1.0.2

Nov 12, 2025

1.0.1

Nov 12, 2025

1.0.0

Nov 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llacie-1.0.4.tar.gz (3.1 MB view details)

Uploaded Dec 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llacie-1.0.4-py3-none-any.whl (2.7 MB view details)

Uploaded Dec 17, 2025 Python 3

File details

Details for the file llacie-1.0.4.tar.gz.

File metadata

Download URL: llacie-1.0.4.tar.gz
Upload date: Dec 17, 2025
Size: 3.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.32.5

File hashes

Hashes for llacie-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`9488d955ba6e4272ecd9e93c2283a66acda2b760eb5095edb67acf39b990b233`
MD5	`9c8a1d2e293214dd511a1ca4865d5b63`
BLAKE2b-256	`1837ab3d477a25d5db22de582f6198be11d29e94400dd92302836b42df04580a`

See more details on using hashes here.

File details

Details for the file llacie-1.0.4-py3-none-any.whl.

File metadata

Download URL: llacie-1.0.4-py3-none-any.whl
Upload date: Dec 17, 2025
Size: 2.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.32.5

File hashes

Hashes for llacie-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2788c5dc36cf2bfa06d59bcac3e52e0ca25b989b445661e328162a8f5fe1d423`
MD5	`e39aca103f99d0885f15210dc7aaedec`
BLAKE2b-256	`58d19e7305df38e1336342b30f3d0b091381a34f64fd05f86ab0ecf5a0a7cbeb`

See more details on using hashes here.

llacie 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LLaCIE

Quickstart and demo

Installing from PyPI

Configuration

Installing a development environment

Using conda

Running tests

Building the package

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes