Python port of data-tamer using LiteLLM for structured outputs and batching

These details have not been verified by PyPI

Project links

Project description

data-tamer

Lightweight Python wrappers (built on LiteLLM) for transforming data with structured outputs, compact prompts for lower token usage, and batching utilities. Strict structured outputs are supported via Pydantic models or JSON Schema.

Install

Install from PyPI via pip or UV:

pip install data-tamer
# or with UV
uv add data-tamer

Basic usage in Python mirrors the TS API and prompt-compaction behavior:

from pydantic import BaseModel
import os
from data_tamer import transform_object, transform_batch


class Person(BaseModel):
    name: str
    age: int | None

# Choose a LiteLLM model id; set provider API keys via env (e.g., OPENAI_API_KEY, OPENROUTER_API_KEY)
model = os.environ.get("LITELLM_MODEL", "gpt-4o-mini")

# Single transform from guidance only
single = transform_object(
    model=model,
    schema=Person,
    prompt_context={
        "instructions": "Extract name and age. Use null when unknown.",
    },
)
print(single["data"])  # -> Person(name=..., age=...)

# Batch transform from compact prompt
inputs = [
    "Jane Doe, 29",
    "Mr. Smith, unknown age",
    {"text": "Alice, 41"},
]

results = transform_batch(
    model=model,
    schema=Person,
    items=inputs,
    batch_size=2,
    prompt_context={
        "instructions": "Extract name and age. Use null when unknown.",
    },
)
print(results)  # list of Person-like dicts

Streaming structured output is supported via data_tamer.stream_transform_object (LiteLLM streaming under the hood).

Async batching

For higher throughput, use the async variant with concurrency:

import asyncio
from pydantic import BaseModel
import os
from data_tamer import async_transform_batch


class Person(BaseModel):
    name: str
    age: int | None


async def main():
    model = os.environ.get("LITELLM_MODEL", "gpt-4o-mini")
    inputs = [f"User {i}, {20 + (i % 40)}" for i in range(100)]
    results = await async_transform_batch(
        model=model,
        schema=Person,
        items=inputs,
        batch_size=10,
        concurrency=5,
        prompt_context={"instructions": "Extract name and age"},
    )
    print(len(results))


asyncio.run(main())

Prompt Compaction

The prompt builder:

De-duplicates schema guidance and uses short, strict JSON directions.
Truncates per-item input via char_limit_per_item.
Supports optional system, instructions, and few-shot examples.
Items are raw inputs (strings or objects). Place guidance/instructions in prompt_context.system/prompt_context.instructions.

API

transform_object(model, schema, items|prompt_context, ...)
- Generates a single structured object. If items are provided, a compact prompt is built; otherwise use prompt_context with instructions.
- schema can be a Pydantic model class or a JSON Schema dict. When supported by the provider, LiteLLM enforces structured output. We also parse JSON and, for dict schemas, validate locally via jsonschema as a fallback.
stream_transform_object(...)
- Streams text chunks and allows awaiting the final parsed object.
transform_batch(model, schema, items, batch_size=..., concurrency=...)
- Splits inputs into batches, builds compact prompts, and parses array outputs. Uses threads when concurrency > 1.
async_transform_batch(...)
- Async variant with concurrency control via asyncio.

Notes

Providers (LiteLLM): pass a model id string (e.g., gpt-4o-mini, openrouter/google/gemini-2.5-flash-lite) and set the corresponding API key in env (OPENAI_API_KEY, OPENROUTER_API_KEY, etc.). Alternatively, pass credentials directly via provider_options, e.g. provider_options={"api_key": "sk-...", "api_base": "https://..."}.
Structured outputs:
- Pydantic: pass a BaseModel subclass as schema. LiteLLM will request structured responses when supported; we parse JSON regardless.
- JSON Schema: pass a dict; we set LiteLLM response_format={"type":"json_schema",...} and also validate locally with jsonschema.
- Helpers: pydantic_json_schema, pydantic_array_json_schema generate dict schemas from Pydantic models.
OpenRouter: set OPENROUTER_API_KEY and pick an OpenRouter model id via LITELLM_MODEL, e.g., openrouter/google/gemini-2.5-flash-lite. Or pass provider_options={"api_key": "..."} with an OpenRouter model id.

Examples

examples/generate_object_example.py — basic structured generation
examples/transform_batch_example.py — batching with compact prompts
examples/jsonschema_example.py — JSON Schema with validation
examples/legacy_contacts.py — real-world cleanup with OpenRouter (default Gemini model)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.5

Oct 24, 2025

This version

0.1.4

Oct 24, 2025

0.1.3

Oct 23, 2025

0.1.2

Oct 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_tamer-0.1.4.tar.gz (11.6 kB view details)

Uploaded Oct 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

data_tamer-0.1.4-py3-none-any.whl (12.0 kB view details)

Uploaded Oct 24, 2025 Python 3

File details

Details for the file data_tamer-0.1.4.tar.gz.

File metadata

Download URL: data_tamer-0.1.4.tar.gz
Upload date: Oct 24, 2025
Size: 11.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for data_tamer-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`d0f988a14c2b5aaa073a42d10df79c18810203b354d2d7de949eaeef1e7c1984`
MD5	`ec63b4a0eb103dd88a2c1c1f96996135`
BLAKE2b-256	`101b7c3cf2b018d2cdcb9404c3469333ea535fa060fff94810d8b4ad77f64eea`

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_tamer-0.1.4.tar.gz:

Publisher: pypi-publish.yml on seb-lewis/data-tamer-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: data_tamer-0.1.4.tar.gz
- Subject digest: d0f988a14c2b5aaa073a42d10df79c18810203b354d2d7de949eaeef1e7c1984
- Sigstore transparency entry: 637592381
- Sigstore integration time: Oct 24, 2025
Source repository:
- Permalink: seb-lewis/data-tamer-py@b9206609de4a6f2069fe5afef8f63485f8792b94
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/seb-lewis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@b9206609de4a6f2069fe5afef8f63485f8792b94
- Trigger Event: push

File details

Details for the file data_tamer-0.1.4-py3-none-any.whl.

File metadata

Download URL: data_tamer-0.1.4-py3-none-any.whl
Upload date: Oct 24, 2025
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for data_tamer-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64036a176b4bf34a3e714cab067a49f0560d57d3091c4758b7a84c32702ff3bd`
MD5	`5ed75793def4d930efb30cd06bb5f609`
BLAKE2b-256	`06c2d6d4f49790d4a7fe9a85ad9fd59881814498f8144898dd2dd4af8c8be7ee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_tamer-0.1.4-py3-none-any.whl:

Publisher: pypi-publish.yml on seb-lewis/data-tamer-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: data_tamer-0.1.4-py3-none-any.whl
- Subject digest: 64036a176b4bf34a3e714cab067a49f0560d57d3091c4758b7a84c32702ff3bd
- Sigstore transparency entry: 637592385
- Sigstore integration time: Oct 24, 2025
Source repository:
- Permalink: seb-lewis/data-tamer-py@b9206609de4a6f2069fe5afef8f63485f8792b94
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/seb-lewis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@b9206609de4a6f2069fe5afef8f63485f8792b94
- Trigger Event: push

data-tamer 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

data-tamer

Install

Async batching

Prompt Compaction

API

Notes

Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance