Python port of data-tamer using LiteLLM for structured outputs and batching
Project description
data-tamer
Lightweight Python wrappers (built on LiteLLM) for transforming data with structured outputs, compact prompts for lower token usage, and batching utilities. Strict structured outputs are supported via Pydantic models or JSON Schema.
Install
Install from PyPI via pip or UV:
pip install data-tamer
# or with UV
uv add data-tamer
Basic usage in Python mirrors the TS API and prompt-compaction behavior:
from pydantic import BaseModel
import os
from data_tamer import transform_object, transform_batch
class Person(BaseModel):
name: str
age: int | None
# Choose a LiteLLM model id; set provider API keys via env (e.g., OPENAI_API_KEY, OPENROUTER_API_KEY)
model = os.environ.get("LITELLM_MODEL", "gpt-4o-mini")
# Single transform from guidance only
single = transform_object(
model=model,
schema=Person,
prompt_context={
"instructions": "Extract name and age. Use null when unknown.",
},
)
print(single["data"]) # -> Person(name=..., age=...)
# Batch transform from compact prompt
inputs = [
"Jane Doe, 29",
"Mr. Smith, unknown age",
{"text": "Alice, 41"},
]
results = transform_batch(
model=model,
schema=Person,
items=inputs,
batch_size=2,
prompt_context={
"instructions": "Extract name and age. Use null when unknown.",
},
)
print(results) # list of Person-like dicts
Streaming structured output is supported via data_tamer.stream_transform_object (LiteLLM streaming under the hood).
Async batching
For higher throughput, use the async variant with concurrency:
import asyncio
from pydantic import BaseModel
import os
from data_tamer import async_transform_batch
class Person(BaseModel):
name: str
age: int | None
async def main():
model = os.environ.get("LITELLM_MODEL", "gpt-4o-mini")
inputs = [f"User {i}, {20 + (i % 40)}" for i in range(100)]
results = await async_transform_batch(
model=model,
schema=Person,
items=inputs,
batch_size=10,
concurrency=5,
prompt_context={"instructions": "Extract name and age"},
)
print(len(results))
asyncio.run(main())
Prompt Compaction
The prompt builder:
- De-duplicates schema guidance and uses short, strict JSON directions.
- Truncates per-item input via
char_limit_per_item. - Supports optional
system,instructions, and few-shotexamples. - Items are raw inputs (strings or objects). Place guidance/instructions in
prompt_context.system/prompt_context.instructions.
API
-
transform_object(model, schema, items|prompt_context, ...)- Generates a single structured object. If
itemsare provided, a compact prompt is built; otherwise useprompt_contextwith instructions. schemacan be a Pydantic model class or a JSON Schema dict. When supported by the provider, LiteLLM enforces structured output. We also parse JSON and, for dict schemas, validate locally viajsonschemaas a fallback.
- Generates a single structured object. If
-
stream_transform_object(...)- Streams text chunks and allows awaiting the final parsed object.
-
transform_batch(model, schema, items, batch_size=..., concurrency=...)- Splits inputs into batches, builds compact prompts, and parses array outputs. Uses threads when
concurrency > 1.
- Splits inputs into batches, builds compact prompts, and parses array outputs. Uses threads when
-
async_transform_batch(...)- Async variant with concurrency control via asyncio.
Notes
- Providers (LiteLLM): pass a model id string (e.g.,
gpt-4o-mini,openrouter/google/gemini-2.5-flash-lite) and set the corresponding API key in env (OPENAI_API_KEY,OPENROUTER_API_KEY, etc.). - Structured outputs:
- Pydantic: pass a
BaseModelsubclass asschema. LiteLLM will request structured responses when supported; we parse JSON regardless. - JSON Schema: pass a dict; we set LiteLLM
response_format={"type":"json_schema",...}and also validate locally withjsonschema. - Helpers:
pydantic_json_schema,pydantic_array_json_schemagenerate dict schemas from Pydantic models.
- Pydantic: pass a
- OpenRouter: set
OPENROUTER_API_KEYand pick an OpenRouter model id viaLITELLM_MODEL, e.g.,openrouter/google/gemini-2.5-flash-lite.
Examples
examples/generate_object_example.py— basic structured generationexamples/transform_batch_example.py— batching with compact promptsexamples/jsonschema_example.py— JSON Schema with validationexamples/legacy_contacts.py— real-world cleanup with OpenRouter (default Gemini model)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_tamer-0.1.3.tar.gz.
File metadata
- Download URL: data_tamer-0.1.3.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2f91c6a38a1cbdb1222b0983905d51f908d4de9625ec2f9cde06ad85c1b3607
|
|
| MD5 |
27eec20652bff866c32f1d4fc20dd6b4
|
|
| BLAKE2b-256 |
70a303222ec65be323268ff7a94696c9bd27eaf9b9158edf2a98e8faca3fdb1d
|
Provenance
The following attestation bundles were made for data_tamer-0.1.3.tar.gz:
Publisher:
pypi-publish.yml on seb-lewis/data-tamer-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_tamer-0.1.3.tar.gz -
Subject digest:
c2f91c6a38a1cbdb1222b0983905d51f908d4de9625ec2f9cde06ad85c1b3607 - Sigstore transparency entry: 634342342
- Sigstore integration time:
-
Permalink:
seb-lewis/data-tamer-py@83ca0324206fcbe59b63df1ef59745452ea8a050 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/seb-lewis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@83ca0324206fcbe59b63df1ef59745452ea8a050 -
Trigger Event:
push
-
Statement type:
File details
Details for the file data_tamer-0.1.3-py3-none-any.whl.
File metadata
- Download URL: data_tamer-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77b753023122ed4228775413bd7f0bb2e020dd553476a6903e5ed13e4906c3ad
|
|
| MD5 |
f4a02058328b39d0ab7e9a593ef2e637
|
|
| BLAKE2b-256 |
7919a502018b14b2d5366a7477407b373eab9d298a3b31cb02a5de0c19be525e
|
Provenance
The following attestation bundles were made for data_tamer-0.1.3-py3-none-any.whl:
Publisher:
pypi-publish.yml on seb-lewis/data-tamer-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_tamer-0.1.3-py3-none-any.whl -
Subject digest:
77b753023122ed4228775413bd7f0bb2e020dd553476a6903e5ed13e4906c3ad - Sigstore transparency entry: 634342346
- Sigstore integration time:
-
Permalink:
seb-lewis/data-tamer-py@83ca0324206fcbe59b63df1ef59745452ea8a050 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/seb-lewis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@83ca0324206fcbe59b63df1ef59745452ea8a050 -
Trigger Event:
push
-
Statement type: