A framework that enables efficient extraction of structured data from unstructured text using large language models (LLMs).
Project description
LLM Extractinator
⚠️ This tool is a prototype in active development and may change significantly. Always verify results!
LLM Extractinator enables efficient extraction of structured data from unstructured text using large language models (LLMs). It supports configurable task definitions, CLI or Python usage, a point‑and‑click GUI Studio, and flexible data input/output formats.
📘 Full documentation: https://DIAGNijmegen.github.io/llm_extractinator/
🔧 Installation
1. Install Ollama
On Linux
curl -fsSL https://ollama.com/install.sh | sh
On Windows or macOS
Download the installer from: https://ollama.com/download
2. Install the Package
Create a fresh conda environment:
conda create -n llm_extractinator python=3.11
conda activate llm_extractinator
Install the package via pip:
pip install llm_extractinator
Or from source:
git clone https://github.com/DIAGNijmegen/llm_extractinator.git
cd llm_extractinator
pip install -e .
Tip: to be able to run the latest models, update the Ollama client regularly:
pip install --upgrade ollama langchain-ollama
🖥️ Interactive Studio GUI (beta)
Starting with v 0.4, Extractinator ships with a Streamlit‑based Studio for designing, running and monitoring extraction tasks with zero code:
launch-extractinator # opens http://localhost:8501 in your browser
Features
| 🗂️ Project Manager | Create / select datasets, parsers and tasks with file previews |
| 🔧 Parser Builder | Visual Pydantic schema designer (nested models supported) |
| 🚀 One‑click Runs | Configure model, sampling & advanced flags, then watch live logs |
| 🛠️ Task JSON Wizard | Step‑by‑step helper to generate valid TaskXXX.json files |
| 🆘 Help bubbles everywhere | Inline docs so you never lose context |
The Studio is fully optional: anything you configure here can still be executed from the CLI or Python API.
🚀 Quick Usage
GUI
launch-extractinator # recommended for new users
CLI
extractinate --task_id 001 --model_name "phi4"
Python
from llm_extractinator import extractinate
extractinate(task_id=1, model_name="phi4")
📁 Task Files
Each task is defined by a JSON file stored in tasks/.
Filename format:
TaskXXX_name.json
Example:
{
"Description": "Extract product data from text.",
"Data_Path": "products.csv",
"Input_Field": "text",
"Parser_Format": "product_parser.py"
}
Parser_Format points to a .py file in tasks/parsers/ that implements a Pydantic OutputParser model used to structure the LLM output.
🛠️ Visual Schema Builder (optional)
If you prefer a graphical approach to designing parsers, run:
build-parser
This starts the same builder embedded in the Studio, letting you assemble nested Pydantic models visually. Save the resulting .py file in tasks/parsers/ and reference it via Parser_Format.
👉 Read the parser docs for full details.
📄 Citation
If you use this tool, please cite: https://doi.org/10.5281/zenodo.15089764
🤝 Contributing
We welcome pull requests! See the contributing guide for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_extractinator-0.5.3.tar.gz.
File metadata
- Download URL: llm_extractinator-0.5.3.tar.gz
- Upload date:
- Size: 38.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91c2c213bbed4a659c92924e7f4f489eaf007bddd16d67e114725069944f988e
|
|
| MD5 |
3ad2de6af92110b19b921a6e37aab9d5
|
|
| BLAKE2b-256 |
3bdbe3392c0d19efd00ef6b6b6490a0bf9ebe52e231bca6af174ce5b7aa9a097
|
File details
Details for the file llm_extractinator-0.5.3-py3-none-any.whl.
File metadata
- Download URL: llm_extractinator-0.5.3-py3-none-any.whl
- Upload date:
- Size: 39.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20b9b56fa841a5cb7da692c44df4de7d0e70184441dff98a5796fd9b74c43225
|
|
| MD5 |
acbff4dd11053073363a2f6a13d60566
|
|
| BLAKE2b-256 |
59405abab6fe77dc3b3c4f04692db7e8542f5534a8439a581a6d26de4bd13961
|