A lightweight Python framework for portable, versioned, reusable LLM extraction tasks.

These details have not been verified by PyPI

Project links

Project description

`xtrllm`

A lightweight Python framework for portable, versioned, reusable LLM extraction tasks.

xtrllm separates two things every other library conflates:

The engine — prompt → structured output → log (stable, ships with the package)
The task — schema + prompt strategy + edge case handling (yours, lives in your repo)

🛠️ Installation

Option 1: Install the latest stable release directly from PyPI using pip:

pip install xtrllm

Option 2: Clone the repository

# 1. Clone the repo:
git clone https://github.com/mauriciomm7/xtrllm.git
cd xtrllm
# 2. Install using pip
pip install -e .

Quickstart

The xtrllm package comes pre-loaded with some example tasks that I built for my own projects. However, the idea of this project is that you will create a costum extraction task for tour project say eulex/tasks/classify_actor.py, where a single Python file contains the Pydantic schema, the prompt function, and the task function, which returns the design output as a Pydantic-validated schema.

The workflow is intentionally simple. First, the user loads a task directory containing Python files with Pydantic schemas, prompt builders, and task classes. Next, an LLMXtractor instance is initialized with the desired task and language model. When the extractor is called, it submits the input to the model and returns a structured output validated against the predefined Pydantic schema. This validation layer ensures that only outputs conforming to the expected structure are returned, improving reliability and reducing the likelihood of malformed responses propagating through downstream pipelines.

import xtrllm
from xtrllm import load_tasks, LLMXtractor

# LOAD your paper tasks
load_tasks("eulex/tasks", namespace="eulex")

# DEFINE what extractor to run
extractor = LLMXtractor(task="eulex/eu_lawyers", model="gpt-4.1-mini")
result = extractor("Komornik Sądowy przy Sądzie Rejonowym w Szczecinku")

# SEE result
print(result.value)
>>> 'ACT_TYPE_GOV_OFFIC_CIT'

In this example, the eu_lawyers task classifies an actor into a predefined actor taxonomy. The model determines the appropriate category, while Pydantic validation guarantees that the returned classification is a valid schema-compliant entry before it can be stored or used elsewhere in the system.

Batch Processing

In principle, you could use a loop to collect results for all the entries you want to process. However, more often than not, you will be working with DataFrames where multiple parameters are passed as inputs and the goal is to return a new output column. For these cases, you can use batch processing.

In this example, I use a tool that classifies whether a given sentence of legal text has positive or negative valence toward a set of entities related to the EU legal order, such as the European Court of Justice, EU law, and related institutions. Here, the function takes two parameters, which are expected to correspond to columns in your DataFrame. Passing them as a dictionary mapping means that you do not need to rename your DataFrame columns manually. For example, if the expected parameter is snippet_text but your column is called txt_snippet, the mapping resolves this internally.

# LOAD TASKS from directory
from xtrllm import load_tasks, DataFrameLLMXtractor

load_tasks(path=r"C:/uot25rev/tasks", namespace="uot25rev")

# DEFINE batch extractor tool parameter mapping
classify_Sent_LLMXtractor = DataFrameLLMXtractor(
    task="uot25rev/valence_for_entities",
    model="gpt-4.1-mini",
    parameters={
        "snippet_text": "snippet_text",
        "entities_str": "entities_str",
    },
    result_col="valence",
)

# LOAD analysis DataFrame
cmlr_sents_df = pd.read_parquet("C:/uot25rev/data/cmlr_snippets_base.parquet")

# RUN the batch extractor 
result_df = classify_Sent_LLMXtractor.run_parallel(cmlr_sents_df, rpm=1_000)

Logging

By default, every call is auto-logged to data/llm/logs.db via the llm library. Never touch it manually — it's the skip-guard source of truth and your full audit trail. For more information checkout Simon Willison project datasette.

Writing a Task

Each extraction task is defined in a single Python file. A task consists of three components:

A Pydantic schema that defines the expected output structure.
A task class that specifies the task name and output schema.
A prompt builder that converts user inputs into the prompt sent to the language model.

For example:

# tasks/ajps2026/eu_lawyers.py

from pydantic import BaseModel
from typing import Optional
from xtrllm.core.base import BaseTask


class EULawyersSchema(BaseModel):
    lawyers: list[str]
    count: int
    confidence: Optional[float] = None


class EULawyersTask(BaseTask):
    name = "eu_lawyers"
    schema = EULawyersSchema

    system_prompt = ( "You extract the names of lawyers appearing in EU court judgments." )

    def build_prompt(self, text: str, context: str = "") -> str:
        return f"Context: {context}\n\nText: {text}"

When this task is executed, the model receives the generated prompt and must return an output that conforms to EULawyersSchema. Any response that fails schema validation is automatically rejected, ensuring that downstream code always receives a predictable structure.

Tasks require no manual registration or configuration. Once the task file exists inside a loaded task directory, load_tasks() will discover and register it automatically.

Swap Providers — Zero Code Changes

LLMXtractor(task="ajps2026/eu_lawyers", model="gpt-4o-mini")
LLMXtractor(task="ajps2026/eu_lawyers", model="claude-3-5-haiku-latest")
LLMXtractor(task="ajps2026/eu_lawyers", model="ollama/llama3")

All providers supported via the llm plugin ecosystem.

Example for Replication Files

In most cases this will not be a substantial part of the project but just an adidtional tool in a entire processing pipeline. In those cases a possible organization structure can be following:

paper_abc/
├── tasks/
│     ├── eu_lawyers.py
│     └── get_citations.py
├── notebooks/
│       ├── main.ipynb
│       └── 
└── results/

🎓 Citation

If you use this framework in academic research, please cite:

Mandujano Manríquez, M. (2026). xtrllm: A lightweight Python framework for portable, versioned, reusable LLM extraction tasks. GitHub: https://github.com/mauriciomm7/xtrllm

@misc{mandujano2026xtrllm,
  author       = {Mauricio Mandujano Manríquez},
  title        = {``xtrllm``: A lightweight Python framework for portable, versioned, reusable LLM extraction tasks},
  year         = {2026},
  howpublished = {\url{https://github.com/mauriciomm7/xtrllm}},
  note         = {GitHub repository}
}

📄 License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtrllm-0.0.1.tar.gz (30.1 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xtrllm-0.0.1-py3-none-any.whl (33.1 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file xtrllm-0.0.1.tar.gz.

File metadata

Download URL: xtrllm-0.0.1.tar.gz
Upload date: Jun 4, 2026
Size: 30.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for xtrllm-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`a296f3d46d79cc77ad585d9ace63e88969b3f8d8b21399b633d0e3e9725bb7ad`
MD5	`a3742ad86904282ff7bb496dd104c164`
BLAKE2b-256	`efad8dabd0975abf21ef03b09a2f235bf29b210645f909ce130dba7a17bb0239`

See more details on using hashes here.

File details

Details for the file xtrllm-0.0.1-py3-none-any.whl.

File metadata

Download URL: xtrllm-0.0.1-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 33.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for xtrllm-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b1751d3724d75b51e99a74189f47a0d44d4c1b3fac8bd807f156561d33fb714`
MD5	`ea0e39468f1cd323e1a9ec776c32b163`
BLAKE2b-256	`b701bb8325bee705fb13fb06f904e3200a8b73e239810f02e9c512d793b70ce7`

See more details on using hashes here.

xtrllm 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`xtrllm`

🛠️ Installation

Quickstart

Batch Processing

Logging

Writing a Task

Swap Providers — Zero Code Changes

Example for Replication Files

🎓 Citation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes