A framework that enables efficient extraction of structured data from unstructured text using large language models (LLMs).
Project description
LLM Extractinator
⚠️ This tool is a prototype in active development and may change significantly. Always verify results!
LLM Extractinator enables efficient extraction of structured data from unstructured text using large language models (LLMs). It supports configurable task definitions, CLI or Python usage, and flexible data input/output formats.
📘 Full documentation: https://DIAGNijmegen.github.io/llm_extractinator/
🔧 Installation
1. Install Ollama
On Linux
curl -fsSL https://ollama.com/install.sh | sh
On Windows or macOS
Download the installer from:
https://ollama.com/download
2. Install the Package
Create a fresh conda environment:
conda create -n llm_extractinator python=3.11
conda activate llm_extractinator
Install the package via pip:
pip install llm_extractinator
Or from source:
git clone https://github.com/DIAGNijmegen/llm_extractinator.git
cd llm_extractinator
pip install -e .
To be able to run the latest models available, make sure to update the ollama package to the latest version:
pip install --upgrade ollama langchain-ollama
🚀 Quick Usage
CLI
extractinate --task_id 001 --model_name "phi4"
Python
from llm_extractinator import extractinate
extractinate(task_id=1, model_name="phi4")
📁 Task Files
Each task is defined using a JSON file stored in the tasks/ directory.
Filename format:
TaskXXX_name.json
Example contents:
{
"Description": "Extract product data from text.",
"Data_Path": "products.csv",
"Input_Field": "text",
"Parser_Format": "product_parser.py"
}
Parser_Format refers to a .py file in tasks/parsers/ that defines a Pydantic OutputParser class used to structure the LLM output.
🛠️ Visual Schema Builder (Optional)
You can visually design the output schema using:
build-parser
This launches a web UI to create a Pydantic OutputParser model, which defines the structure of the extracted data. Additional models can be added and nested for complex formats.
The resulting .py file should be saved in:
tasks/parsers/
And referenced in your task JSON under the Parser_Format key.
👉 See parser docs for full usage.
📄 Citation
If you use this tool, please cite: 10.5281/zenodo.15089764
🤝 Contributing
We welcome contributions! See the full contributing guide in the docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_extractinator-0.5.2.tar.gz.
File metadata
- Download URL: llm_extractinator-0.5.2.tar.gz
- Upload date:
- Size: 37.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e382b4f1adcc72a2db42b121eab62e7b8a520006f23c5e65ce7c9e2595480882
|
|
| MD5 |
2b7c6e8607e3d942068bb0cb5819960f
|
|
| BLAKE2b-256 |
361165cc3e44129511dee2e8316fd820f8d0bb66aec0c1a3007d598edd3aff4e
|
File details
Details for the file llm_extractinator-0.5.2-py3-none-any.whl.
File metadata
- Download URL: llm_extractinator-0.5.2-py3-none-any.whl
- Upload date:
- Size: 38.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9119445bb995f5f7b5f87b271d5c88cfb20e78e7ffc4fd2bb6f7ee866fbd948f
|
|
| MD5 |
e55a0cb6020cdb1de4f2c4a6e3595d4c
|
|
| BLAKE2b-256 |
72adee3218130ed769035c07cfcb06e82cb2f00514cd6d5546a7ab3864ae55eb
|