Skip to main content

JSON conversion pipelines via LLM-generated JSONata rules with caching.

Project description

translet

JSON conversions via JSONata rules generated by an LLM and cached in a database (through dbset).

On the first call to convert, translet:

  1. builds a cache key (from an explicit name or from the structural shape of input + target);
  2. if no rule is cached — asks the LLM to produce a JSONata expression and stores it;
  3. applies the rule via jsonata-python;
  4. validates the result against the target spec (JSON Schema or sample);
  5. on failure — regenerates the rule with error context and retries (up to 2 times by default).

Subsequent calls with the same shape hit the cache — no LLM round-trip.

Contents

Installation

pip install translet[all-llm]

all-llm installs the openai SDK, which translet uses to talk to OpenAI, Azure OpenAI, Groq, and NVIDIA NIM. Pick one if you prefer: translet[openai], translet[azure], translet[groq], translet[nvidia].

Quick start

from translet import Translet

t = Translet.from_env()  # reads TRANSLET_LLM_*, TRANSLET_DB_*, etc.

result = t.transjson.convert(
    {"user": {"name": "Alice", "age": 30}},
    target_sample={"name": "x", "age": 0},
)
# {"name": "Alice", "age": 30}

Three ways to specify a target

convert(source, *, target_schema=None, target_sample=None, description=None, name=None) accepts one of three target specifications. The choice affects both the LLM prompt and post-validation.

1. JSON Schema — strict validation

schema = {
    "type": "object",
    "properties": {
        "full_name": {"type": "string"},
        "age": {"type": "integer", "minimum": 0},
    },
    "required": ["full_name", "age"],
}

result = t.transjson.convert(
    {"first_name": "Alice", "last_name": "Smith", "age": 30},
    target_schema=schema,
)
# {"full_name": "Alice Smith", "age": 30}

After rule generation the result is validated through jsonschema. A mismatch raises ValidationError and triggers regeneration (up to max_retries).

2. Sample (target_sample) — structural match

The LLM gets a concrete example of the desired shape. Validation checks key/type compatibility against the sample.

result = t.transjson.convert(
    {"first_name": "Alice", "last_name": "Smith"},
    target_sample={"full_name": "Alice Smith"},
)

3. Description + explicit name

When the target shape is trivial or scalar, describe the task in plain text. Always pass name= to give the cache an explicit key.

result = t.transjson.convert(
    {"items": [{"v": 1}, {"v": 2}, {"v": 3}]},
    description="sum of all v values",
    name="sum_v_rule",
)
# 6

Cache keys

  • name="foo" → key name:foo (stable, survives changes to the input shape).
  • No name → key hash:<sha> derived from input shape + target. Any structural change yields a fresh key.

Configuration via .env

Translet.from_env(env_file=...) loads a .env-style file and reads environment variables. Standard KEY=VALUE lines, # comments, optional surrounding quotes are supported.

from translet import Translet

t = Translet.from_env(env_file=".env")

Environment variables

Variable Purpose
TRANSLET_LLM_PROVIDER openai / azure / groq / nvidia
TRANSLET_LLM_MODEL Model name (required)
TRANSLET_LLM_BASE_URL Optional base URL override
OPENAI_API_KEY / AZURE_OPENAI_API_KEY / GROQ_API_KEY / NVIDIA_API_KEY Provider-specific API key (recommended — matches the SDK convention)
TRANSLET_API_KEY Generic fallback when no provider-specific key is set
AZURE_OPENAI_ENDPOINT / AZURE_OPENAI_API_VERSION Azure only
TRANSLET_DB_PATH dbset connection string (default: sqlite:///translet.db)
TRANSLET_DB_TABLE Table name (default: translet_rules)
TRANSLET_TTL_SECONDS Rule TTL (default: no TTL)
TRANSLET_MAX_RETRIES Number of regenerate attempts (default: 2)

The provider-specific key takes precedence over TRANSLET_API_KEY.

Overriding parameters from code

Translet.from_env(...) accepts kwargs that override values from .env / os.environ:

from pathlib import Path
from translet import Translet

DB_PATH = Path("./cache/rules.db")

t = Translet.from_env(
    env_file=".env",
    db_path=f"sqlite:///{DB_PATH}",   # overrides TRANSLET_DB_PATH
    max_retries=5,                    # overrides TRANSLET_MAX_RETRIES
    api_key="sk-...",                 # routed into the provider-specific env var
)

Available kwargs: provider, model, base_url, api_key, db_path, db_table, ttl_seconds, max_retries. None (the default) keeps whatever is already in .env / the environment. Pass override=True to force load_dotenv to overwrite existing os.environ values from the file.

Loading .env separately from Translet

from translet import load_dotenv

load_dotenv(".env")                # setdefault semantics
load_dotenv(".env", override=True)  # force overwrite

Async

import asyncio
from translet import AsyncTranslet

async def main():
    t = await AsyncTranslet.from_env(env_file=".env")
    result = await t.transjson.aconvert(
        {"user": {"name": "Alice"}},
        target_sample={"name": "x"},
    )
    print(result)

asyncio.run(main())

AsyncTranslet.from_env accepts the same overrides as the sync version.

Manual wiring (without from_env)

from dbset import connect
from translet import Translet, TransletConfig
from translet.llm import openai
from translet.store import DbSetStore

db = connect("sqlite:///translet.db")
t = Translet(
    llm=openai("gpt-4o", api_key="..."),
    store=DbSetStore(db, table="my_rules"),
    config=TransletConfig(max_retries=2, ttl_seconds=None),
)

LLM factories: openai, azure, groq, nvidia (sync) and aopenai, aazure, agroq, anvidia (async).

DbSetStore does not own the connection — close db explicitly (db.close()). This lets a single connection back several stores / tables.

Cache management

# Explicit invalidation by name or by raw key
t.transjson.invalidate("sum_v_rule")
t.transjson.invalidate("name:full_name_v1")
t.transjson.invalidate("hash:abc123...")

# Manual TTL eviction (returns the number of removed rules)
removed = t.transjson.evict_expired(ttl_seconds=86400)

If ttl_seconds is set on TransletConfig, calling evict_expired() without an argument uses that as the default.

Statistics

compute_stats(rules) aggregates the rule cache; format_stats(stats) renders it as text.

from translet import Translet, compute_stats, format_stats

t = Translet.from_env()
rules = t.store.list(limit=10000)
print(format_stats(compute_stats(rules, top=10)))

RuleStats fields: total_rules, total_uses, total_successes, total_failures, success_rate, by_provider, by_model, top_by_usage, oldest_created, newest_created, last_used.

A CLI helper for quick checks:

python examples/show_stats.py --env-file .env
python examples/show_stats.py --db-path "sqlite:///./cache/rules.db" --top 10

Error handling

All exceptions inherit from TransletError:

from translet import (
    ConversionError,      # all retries exhausted
    RuleGenerationError,  # LLM failed to produce a usable JSONata expression
    JsonataError,         # JSONata compile/eval failure
    ValidationError,      # result didn't match the schema/sample
    StoreError,           # storage backend failure
)

try:
    result = t.transjson.convert(source, target_sample=sample)
except ConversionError as exc:
    print(f"failed for key={exc.key}, last error: {exc.last_error!r}")

The on_failure field of TransletConfig controls behaviour: "regenerate" (default — retry on failure) or "raise" (fail fast).

Extending

Custom LLM provider

from translet.llm import LLMClient, Message

class MyLLM:
    provider = "my-provider"
    model = "my-model"

    def complete(self, messages: list[Message], *, temperature: float = 0.0, max_tokens: int = 2048) -> str:
        ...  # return the response text

t = Translet(llm=MyLLM(), store=store)

LLMClient is a runtime_checkable Protocol — explicit inheritance is optional, structural compatibility is enough. The async counterpart is AsyncLLMClient with an acomplete method.

Custom store

Implement translet.store.RuleStore (or AsyncRuleStore) — methods get, put, touch, delete, evict_expired, list. DbSetStore is a reference implementation.

Custom system prompt

from translet import TransletConfig

config = TransletConfig(system_prompt="You are a JSONata generator. Output only the expression.")

For full prompt control, pass your own prompt_builder=YourPromptBuilder() (see translet.transjson.PromptBuilder).

Development

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,all-llm]"
pytest -q

Project layout:

src/translet/
  core.py              # Translet / AsyncTranslet, from_env, load_dotenv
  llm/                 # LLMClient Protocol + OpenAI-compatible clients
  store/               # RuleStore Protocol + DbSetStore
  transjson/           # convert pipeline (generate → JSONata → validate → retry)
  stats.py             # cache-statistics aggregation
  exceptions.py
examples/
  simple_nvidia.py        # minimal NVIDIA NIM example
  simple_from_env.py      # universal example driven by .env
  show_stats.py           # CLI: cache statistics

Build and publish

python -m build
python -m twine check dist/*
python -m twine upload dist/*

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translet-0.1.0.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

translet-0.1.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file translet-0.1.0.tar.gz.

File metadata

  • Download URL: translet-0.1.0.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for translet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 19569ec4144c6e3ea415291778245382f8d3f9a4d31254caba31a1bdac6adb90
MD5 aa3ad568012120a9852b45d58904cf1e
BLAKE2b-256 9d480e600e1eb73087eab7c81305c23c1ff1c4615ce05871073f14e288446636

See more details on using hashes here.

File details

Details for the file translet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: translet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for translet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 883a868426a63f4371c28d484810c6e5799622618370c7a44a8fc681fcb159b1
MD5 2437848db92efc6c4dfb416bf94255b4
BLAKE2b-256 c231085a0ce7b0278dae8f85ca07254a3a942e08f381d293a61edea72f71e488

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page