JSON conversion pipelines via LLM-generated JSONata rules with caching.
Project description
translet
JSON conversions via JSONata rules generated by an LLM and cached in a database (through dbset).
On the first call to convert, translet:
- builds a cache key (from an explicit name or from the structural shape of input + target);
- if no rule is cached — asks the LLM to produce a JSONata expression and stores it;
- applies the rule via
jsonata-python; - validates the result against the target spec (JSON Schema or sample);
- on failure — regenerates the rule with error context and retries (up to 2 times by default).
Subsequent calls with the same shape hit the cache — no LLM round-trip.
Contents
- Installation
- Quick start
- Three ways to specify a target
- Configuration via
.env - Async
- Manual wiring (without
from_env) - Cache management
- Statistics
- Error handling
- Extending
- Development
Installation
pip install translet[all-llm]
all-llm installs the openai SDK, which translet uses to talk to OpenAI, Azure OpenAI, Groq, and NVIDIA NIM. Pick one if you prefer: translet[openai], translet[azure], translet[groq], translet[nvidia].
Quick start
from translet import Translet
t = Translet.from_env() # reads TRANSLET_LLM_*, TRANSLET_DB_*, etc.
result = t.transjson.convert(
{"user": {"name": "Alice", "age": 30}},
target_sample={"name": "x", "age": 0},
)
# {"name": "Alice", "age": 30}
Three ways to specify a target
convert(source, *, target_schema=None, target_sample=None, description=None, name=None) accepts one of three target specifications. The choice affects both the LLM prompt and post-validation.
1. JSON Schema — strict validation
schema = {
"type": "object",
"properties": {
"full_name": {"type": "string"},
"age": {"type": "integer", "minimum": 0},
},
"required": ["full_name", "age"],
}
result = t.transjson.convert(
{"first_name": "Alice", "last_name": "Smith", "age": 30},
target_schema=schema,
)
# {"full_name": "Alice Smith", "age": 30}
After rule generation the result is validated through jsonschema. A mismatch raises ValidationError and triggers regeneration (up to max_retries).
2. Sample (target_sample) — structural match
The LLM gets a concrete example of the desired shape. Validation checks key/type compatibility against the sample.
result = t.transjson.convert(
{"first_name": "Alice", "last_name": "Smith"},
target_sample={"full_name": "Alice Smith"},
)
3. Description + explicit name
When the target shape is trivial or scalar, describe the task in plain text. Always pass name= to give the cache an explicit key.
result = t.transjson.convert(
{"items": [{"v": 1}, {"v": 2}, {"v": 3}]},
description="sum of all v values",
name="sum_v_rule",
)
# 6
Cache keys
name="foo"→ keyname:foo(stable, survives changes to the input shape).- No
name→ keyhash:<sha>derived from input shape + target. Any structural change yields a fresh key.
Configuration via .env
Translet.from_env(env_file=...) loads a .env-style file and reads environment variables. Standard KEY=VALUE lines, # comments, optional surrounding quotes are supported.
from translet import Translet
t = Translet.from_env(env_file=".env")
Environment variables
| Variable | Purpose |
|---|---|
TRANSLET_LLM_PROVIDER |
openai / azure / groq / nvidia |
TRANSLET_LLM_MODEL |
Model name (required) |
TRANSLET_LLM_BASE_URL |
Optional base URL override |
OPENAI_API_KEY / AZURE_OPENAI_API_KEY / GROQ_API_KEY / NVIDIA_API_KEY |
Provider-specific API key (recommended — matches the SDK convention) |
TRANSLET_API_KEY |
Generic fallback when no provider-specific key is set |
AZURE_OPENAI_ENDPOINT / AZURE_OPENAI_API_VERSION |
Azure only |
TRANSLET_DB_PATH |
dbset connection string (default: sqlite:///translet.db) |
TRANSLET_DB_TABLE |
Table name (default: translet_rules) |
TRANSLET_TTL_SECONDS |
Rule TTL (default: no TTL) |
TRANSLET_MAX_RETRIES |
Number of regenerate attempts (default: 2) |
The provider-specific key takes precedence over TRANSLET_API_KEY.
Overriding parameters from code
Translet.from_env(...) accepts kwargs that override values from .env / os.environ:
from pathlib import Path
from translet import Translet
DB_PATH = Path("./cache/rules.db")
t = Translet.from_env(
env_file=".env",
db_path=f"sqlite:///{DB_PATH}", # overrides TRANSLET_DB_PATH
max_retries=5, # overrides TRANSLET_MAX_RETRIES
api_key="sk-...", # routed into the provider-specific env var
)
Available kwargs: provider, model, base_url, api_key, db_path, db_table, ttl_seconds, max_retries. None (the default) keeps whatever is already in .env / the environment. Pass override=True to force load_dotenv to overwrite existing os.environ values from the file.
Loading .env separately from Translet
from translet import load_dotenv
load_dotenv(".env") # setdefault semantics
load_dotenv(".env", override=True) # force overwrite
Async
import asyncio
from translet import AsyncTranslet
async def main():
t = await AsyncTranslet.from_env(env_file=".env")
result = await t.transjson.aconvert(
{"user": {"name": "Alice"}},
target_sample={"name": "x"},
)
print(result)
asyncio.run(main())
AsyncTranslet.from_env accepts the same overrides as the sync version.
Manual wiring (without from_env)
from dbset import connect
from translet import Translet, TransletConfig
from translet.llm import openai
from translet.store import DbSetStore
db = connect("sqlite:///translet.db")
t = Translet(
llm=openai("gpt-4o", api_key="..."),
store=DbSetStore(db, table="my_rules"),
config=TransletConfig(max_retries=2, ttl_seconds=None),
)
LLM factories: openai, azure, groq, nvidia (sync) and aopenai, aazure, agroq, anvidia (async).
DbSetStore does not own the connection — close db explicitly (db.close()). This lets a single connection back several stores / tables.
Cache management
# Explicit invalidation by name or by raw key
t.transjson.invalidate("sum_v_rule")
t.transjson.invalidate("name:full_name_v1")
t.transjson.invalidate("hash:abc123...")
# Manual TTL eviction (returns the number of removed rules)
removed = t.transjson.evict_expired(ttl_seconds=86400)
If ttl_seconds is set on TransletConfig, calling evict_expired() without an argument uses that as the default.
Statistics
compute_stats(rules) aggregates the rule cache; format_stats(stats) renders it as text.
from translet import Translet, compute_stats, format_stats
t = Translet.from_env()
rules = t.store.list(limit=10000)
print(format_stats(compute_stats(rules, top=10)))
RuleStats fields: total_rules, total_uses, total_successes, total_failures, success_rate, by_provider, by_model, top_by_usage, oldest_created, newest_created, last_used.
A CLI helper for quick checks:
python examples/show_stats.py --env-file .env
python examples/show_stats.py --db-path "sqlite:///./cache/rules.db" --top 10
Error handling
All exceptions inherit from TransletError:
from translet import (
ConversionError, # all retries exhausted
RuleGenerationError, # LLM failed to produce a usable JSONata expression
JsonataError, # JSONata compile/eval failure
ValidationError, # result didn't match the schema/sample
StoreError, # storage backend failure
)
try:
result = t.transjson.convert(source, target_sample=sample)
except ConversionError as exc:
print(f"failed for key={exc.key}, last error: {exc.last_error!r}")
The on_failure field of TransletConfig controls behaviour: "regenerate" (default — retry on failure) or "raise" (fail fast).
Extending
Custom LLM provider
from translet.llm import LLMClient, Message
class MyLLM:
provider = "my-provider"
model = "my-model"
def complete(self, messages: list[Message], *, temperature: float = 0.0, max_tokens: int = 2048) -> str:
... # return the response text
t = Translet(llm=MyLLM(), store=store)
LLMClient is a runtime_checkable Protocol — explicit inheritance is optional, structural compatibility is enough. The async counterpart is AsyncLLMClient with an acomplete method.
Custom store
Implement translet.store.RuleStore (or AsyncRuleStore) — methods get, put, touch, delete, evict_expired, list. DbSetStore is a reference implementation.
Custom system prompt
from translet import TransletConfig
config = TransletConfig(system_prompt="You are a JSONata generator. Output only the expression.")
For full prompt control, pass your own prompt_builder=YourPromptBuilder() (see translet.transjson.PromptBuilder).
Development
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,all-llm]"
pytest -q
Project layout:
src/translet/
core.py # Translet / AsyncTranslet, from_env, load_dotenv
llm/ # LLMClient Protocol + OpenAI-compatible clients
store/ # RuleStore Protocol + DbSetStore
transjson/ # convert pipeline (generate → JSONata → validate → retry)
stats.py # cache-statistics aggregation
exceptions.py
examples/
simple_nvidia.py # minimal NVIDIA NIM example
simple_from_env.py # universal example driven by .env
show_stats.py # CLI: cache statistics
Build and publish
python -m build
python -m twine check dist/*
python -m twine upload dist/*
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file translet-0.1.0.tar.gz.
File metadata
- Download URL: translet-0.1.0.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19569ec4144c6e3ea415291778245382f8d3f9a4d31254caba31a1bdac6adb90
|
|
| MD5 |
aa3ad568012120a9852b45d58904cf1e
|
|
| BLAKE2b-256 |
9d480e600e1eb73087eab7c81305c23c1ff1c4615ce05871073f14e288446636
|
File details
Details for the file translet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: translet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
883a868426a63f4371c28d484810c6e5799622618370c7a44a8fc681fcb159b1
|
|
| MD5 |
2437848db92efc6c4dfb416bf94255b4
|
|
| BLAKE2b-256 |
c231085a0ce7b0278dae8f85ca07254a3a942e08f381d293a61edea72f71e488
|