A framework that enables efficient extraction of structured data from unstructured text using large language models (LLMs).

These details have not been verified by PyPI

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

LLM Extractinator

Overview of the LLM Data Extractor

⚠️ This tool is a prototype in active development and may change significantly. Always verify results!

LLM Extractinator enables efficient extraction of structured data from unstructured text using large language models (LLMs). It supports configurable task definitions, CLI or Python usage, and flexible data input/output formats.

📘 Full documentation: https://DIAGNijmegen.github.io/llm_extractinator/

🔧 Installation

1. Install Ollama

On Linux

curl -fsSL https://ollama.com/install.sh | sh

On Windows or macOS

Download the installer from:
https://ollama.com/download

2. Install the Package

Create a fresh conda environment:

conda create -n llm_extractinator python=3.11
conda activate llm_extractinator

Install the package via pip:

pip install llm_extractinator

Or from source:

git clone https://github.com/DIAGNijmegen/llm_extractinator.git
cd llm_extractinator
pip install -e .

To be able to run the latest models available, make sure to update the ollama package to the latest version:

pip install --upgrade ollama langchain-ollama

🚀 Quick Usage

CLI

extractinate --task_id 001 --model_name "phi4"

Python

from llm_extractinator import extractinate

extractinate(task_id=1, model_name="phi4")

📁 Task Files

Each task is defined using a JSON file stored in the tasks/ directory.

Filename format:

TaskXXX_name.json

Example contents:

{
  "Description": "Extract product data from text.",
  "Data_Path": "products.csv",
  "Input_Field": "text",
  "Parser_Format": "product_parser.py"
}

Parser_Format refers to a .py file in tasks/parsers/ that defines a Pydantic OutputParser class used to structure the LLM output.

🛠️ Visual Schema Builder (Optional)

You can visually design the output schema using:

build-parser

This launches a web UI to create a Pydantic OutputParser model, which defines the structure of the extracted data. Additional models can be added and nested for complex formats.

The resulting .py file should be saved in:

tasks/parsers/

And referenced in your task JSON under the Parser_Format key.

👉 See parser docs for full usage.

📄 Citation

If you use this tool, please cite: 10.5281/zenodo.15089764

🤝 Contributing

We welcome contributions! See the full contributing guide in the docs.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.5.13

Apr 3, 2026

0.5.12

Apr 3, 2026

0.5.11

Apr 2, 2026

0.5.10

Feb 24, 2026

0.5.9

Dec 9, 2025

0.5.8

Nov 14, 2025

0.5.7

Oct 22, 2025

0.5.6

Oct 22, 2025

0.5.5

Sep 19, 2025

0.5.4

Jul 29, 2025

0.5.3

Jul 21, 2025

This version

0.5.2

Jul 21, 2025

0.5.1

Jun 13, 2025

0.5.0

May 21, 2025

0.4.2

Mar 17, 2025

0.4.1

Feb 20, 2025

0.4.0

Feb 19, 2025

0.3.7

Feb 11, 2025

0.3.6

Feb 7, 2025

0.3.5

Feb 6, 2025

0.3.4

Feb 6, 2025

0.3.3

Feb 6, 2025

0.3.2

Feb 6, 2025

0.3.1

Feb 6, 2025

0.3.0

Feb 5, 2025

0.2.4

Jan 17, 2025

0.2.3

Jan 16, 2025

0.2.2

Jan 13, 2025

0.2.1

Jan 2, 2025

0.2.0

Jan 2, 2025

0.1.5

Dec 19, 2024

0.1.4

Dec 19, 2024

0.1.3

Dec 19, 2024

0.1.2

Dec 19, 2024

0.1.1

Dec 19, 2024

0.1.0

Dec 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_extractinator-0.5.2.tar.gz (37.2 kB view details)

Uploaded Jul 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_extractinator-0.5.2-py3-none-any.whl (38.7 kB view details)

Uploaded Jul 21, 2025 Python 3

File details

Details for the file llm_extractinator-0.5.2.tar.gz.

File metadata

Download URL: llm_extractinator-0.5.2.tar.gz
Upload date: Jul 21, 2025
Size: 37.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for llm_extractinator-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`e382b4f1adcc72a2db42b121eab62e7b8a520006f23c5e65ce7c9e2595480882`
MD5	`2b7c6e8607e3d942068bb0cb5819960f`
BLAKE2b-256	`361165cc3e44129511dee2e8316fd820f8d0bb66aec0c1a3007d598edd3aff4e`

See more details on using hashes here.

File details

Details for the file llm_extractinator-0.5.2-py3-none-any.whl.

File metadata

Download URL: llm_extractinator-0.5.2-py3-none-any.whl
Upload date: Jul 21, 2025
Size: 38.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for llm_extractinator-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9119445bb995f5f7b5f87b271d5c88cfb20e78e7ffc4fd2bb6f7ee866fbd948f`
MD5	`e55a0cb6020cdb1de4f2c4a6e3595d4c`
BLAKE2b-256	`72adee3218130ed769035c07cfcb06e82cb2f00514cd6d5546a7ab3864ae55eb`

See more details on using hashes here.

llm-extractinator 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LLM Extractinator

🔧 Installation

1. Install Ollama

On Linux

On Windows or macOS

2. Install the Package

🚀 Quick Usage

CLI

Python

📁 Task Files

🛠️ Visual Schema Builder (Optional)

📄 Citation

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes