Thin Python SDK for Latence TRACE.
Project description
Latence TRACE Python SDK
Real-time protection for AI agents. Verify answers, redact private data, reduce wasted context, and log decisions without replacing your stack.
PyPI · Quickstart · Sessions · Integrations · Examples · Website
pip install latence
from latence import Latence
trace = Latence(api_key="lat_...")
score = trace.grounding.rag(
query="Can I promise this customer a refund?",
response_text="Yes, the refund will arrive within 48 hours.",
raw_context="Refunds require manual finance approval before timelines are promised.",
)
print(score.risk_band)
print(score.runtime_decision)
Why This Exists
Agents are moving from demos into real workflows. That means private data, unsupported answers, prompt attacks, tool drift, and wasted memory are no longer abstract research problems. They become support escalations, broken automations, audit gaps, token waste, and user trust issues.
TRACE is the protection layer that sits next to your RAG pipeline, coding agent, or tool-using workflow. Your agent keeps running. TRACE checks the turn and returns evidence plus a decision your application can route.
What TRACE Does
TRACE is intentionally small at the SDK layer. The heavy work lives in your TRACE runtime deployment; this package is the thin Python interface.
- Verify RAG answers against retrieved context.
- Score coding-agent output against codebase context.
- Redact private data before it spreads through tools, logs, or prompts.
- Compress and repair long-running context with InfiniMem.
- Roll up an agent session into review-ready signals.
- Persist caller-carried state without forcing sticky server sessions.
Proof Points
These are runtime proof points from the TRACE freeze evidence, not SDK-only microbenchmarks. See the linked artifacts for full context.
- Grounding: local managed-runtime 360 reported
1.00AUROC for grounded vs. ungrounded RAG cases. - Coding agents: local 360 reported
1.00AUROC for code phantom detection. - Wasted context: held-out unused-context classification reported
1.00precision and1.00recall. - Latency: local concurrency burst reported about
368 msRAG p95 and334 mscode p95 in the managed-runtime proof. - Privacy: redaction returns labels, offsets, scores, redacted output, entity counts, and timings for logging-ready GDPR workflows.
- Memory: InfiniMem is designed for up to
90%context reduction while keeping hot context available to the agent. - Guard checks: Prompt Guard warmup proved
torch.compileenabled on CUDA in the runtime proof.
Evidence:
How It Works
- Deploy or access a TRACE runtime: RunPod, FastAPI, VPC, or on-prem.
- Install
latence. - Send the agent turn to the product path that matches the workflow.
- Route on
risk_band,runtime_decision, scores, spans, and evidence. - Store only the audit evidence your policy allows.
from latence import Latence
trace = Latence(
api_key="lat_...",
base_url="https://your-trace-endpoint.example.com",
)
Environment variables are supported:
export LATENCE_TRACE_API_KEY="lat_..."
export LATENCE_TRACE_URL="https://your-trace-endpoint.example.com"
The SDK Surface
The SDK mirrors the TRACE product API directly:
- Privacy:
client.privacy.redact(...) - RAG grounding:
client.grounding.rag(...) - Code grounding:
client.grounding.code(...) - Text compression:
client.compression.text(...) - Message compression:
client.compression.messages(...) - Memory update:
client.memory.step(...) - Stateless rollup:
client.rollup(...) - Caller-carried sessions:
client.session(...)
Latence is synchronous. AsyncLatence exposes the same surface for asyncio
services.
from latence import AsyncLatence, Latence
Base dependencies are only httpx and pydantic. Runtime and model packages
such as torch, transformers, triton, FastAPI, and vLLM are not SDK
dependencies.
Start With The Path You Need
RAG Agents
Use TRACE when your answer must be grounded in retrieved context.
score = trace.grounding.rag(
query="What is the refund policy?",
response_text=agent_answer,
raw_context=retrieved_context,
)
if score.risk_band.value != "green":
send_to_review(score)
Example: RAG grounding with guard checks
Coding Agents
Use TRACE when an agent explains, edits, or reasons over code.
score = trace.grounding.code(
query="Does this patch add retry handling?",
response_text=agent_answer,
raw_context=code_context,
extra={"response_language_hint": "python"},
)
Example: Code grounding
Privacy
Use TRACE before customer data enters prompts, tools, traces, or logs.
redacted = trace.privacy.redact(
text="Email jane@example.com and charge IBAN DE89370400440532013000.",
)
print(redacted.redacted_text)
print(redacted.unique_labels)
Example: Privacy redaction
Compression And Memory
Use TRACE when long-running workflows start dragging dead context forward.
compressed = trace.compression.text(
"Long retrieved context...",
compression_rate=0.4,
)
memory = trace.memory.step(
turn_text="User asked for manual refund approval.",
prior_memory_state=current_state,
)
current_state = memory.next_memory_state
Examples: Compression, Memory step
Sessions
TRACE runtimes can stay stateless while the SDK carries state for your agent.
from latence import FileSessionStorage, Latence
trace = Latence()
session = trace.session(
session_id="support-run-42",
storage=FileSessionStorage(".trace-sessions"),
)
session.event("tool", "loaded refund policy")
session.memory_step(turn_text="Keep finance approval as required context.")
score = session.rag(
query="Can I promise the refund?",
response_text="Yes, the refund is guaranteed in 48 hours.",
raw_context="Refunds require manual finance approval.",
)
session.save()
Docs: Sessions
Example: Session facade
For Whom
TRACE is for teams building or operating:
- RAG products where unsupported answers are expensive.
- Coding agents that need codebase-grounded reasoning over many steps.
- Support agents that touch customer records and policies.
- Legal, finance, healthcare, or regulated workflows that need evidence.
- Internal agent platforms where observability, retries, and human review matter.
It is also useful for framework authors and platform teams that need one consistent protection API across LangGraph, LangChain, LlamaIndex, n8n, Cursor, Claude Code, Codex, and custom agent runners.
Integrations
Direct calls are the recommended path. Optional helpers live under
latence.integrations.
pip install "latence[langchain]"
pip install "latence[llama_index]"
pip install "latence[openai]"
pip install "latence[haystack]"
Docs: Integrations
Example: Async batch
Async
from latence import AsyncLatence
async with AsyncLatence() as trace:
score = await trace.grounding.rag(
query="What changed?",
response_text="The policy now allows refunds.",
raw_context="The policy still requires manual approval.",
)
Now What
If you are integrating TRACE into an agent:
- Run the quickstart.
- Pick one product path: RAG, code, privacy, compression, memory, or session.
- Add one route in your app for
green,amber, andred. - Log request ID, risk band, runtime decision, and redacted evidence.
- Replay a few real failures and tune your thresholds.
If you are publishing or validating this SDK:
python -m pip install -e ".[dev]"
python -m pytest
python -m ruff check .
python scripts/check_contract.py --manifest ../latence-trace/docs/core_freeze/api_surface_manifest.json
python -m build
python -m twine check dist/*
Clean-wheel smoke testing should run outside the repo root:
python -m venv /tmp/latence-sdk-smoke
/tmp/latence-sdk-smoke/bin/pip install dist/*.whl
cd /tmp && /tmp/latence-sdk-smoke/bin/python - <<'PY'
from importlib.metadata import distribution
from latence import Latence
requires = distribution("latence").requires or []
for forbidden in ("torch", "transformers", "triton", "fastapi", "vllm"):
assert not any(req.lower().startswith(forbidden) for req in requires), requires
assert Latence(base_url="http://localhost:8090")
print("latence SDK smoke passed")
PY
Migration
Primary imports:
from latence import Latence, AsyncLatence
Preview aliases remain available so existing TRACE preview code can move first and clean up names later:
from latence import LatenceTraceClient, AsyncLatenceTraceClient
More
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file latence-0.1.4.tar.gz.
File metadata
- Download URL: latence-0.1.4.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5743f38a03eb0f71c3dfedb74b055056d123f98259eb400a7ce5d5feb570f1d6
|
|
| MD5 |
ec2f79e7bf7248b12a39e3bbb744d46f
|
|
| BLAKE2b-256 |
7e7e1ee3e338bafc67e37248187244d1c540b409ba6280caf17c0eeac372445d
|
Provenance
The following attestation bundles were made for latence-0.1.4.tar.gz:
Publisher:
publish.yml on latenceainew/latence-trace-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latence-0.1.4.tar.gz -
Subject digest:
5743f38a03eb0f71c3dfedb74b055056d123f98259eb400a7ce5d5feb570f1d6 - Sigstore transparency entry: 1439613074
- Sigstore integration time:
-
Permalink:
latenceainew/latence-trace-python@020b51d24f0d187982b1b0929bd64f8f2e9bb161 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/latenceainew
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@020b51d24f0d187982b1b0929bd64f8f2e9bb161 -
Trigger Event:
release
-
Statement type:
File details
Details for the file latence-0.1.4-py3-none-any.whl.
File metadata
- Download URL: latence-0.1.4-py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74a2ad3d1d379fc9adda09c412380031331b66b9670b651f671dffbba037bdbf
|
|
| MD5 |
1835b94b6c81ea4affcb3912323b194d
|
|
| BLAKE2b-256 |
2b00aab8ac8aa494f08e1df7da3beceedc149a3ecc945ac7d5d3d76bfae294d2
|
Provenance
The following attestation bundles were made for latence-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on latenceainew/latence-trace-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latence-0.1.4-py3-none-any.whl -
Subject digest:
74a2ad3d1d379fc9adda09c412380031331b66b9670b651f671dffbba037bdbf - Sigstore transparency entry: 1439613086
- Sigstore integration time:
-
Permalink:
latenceainew/latence-trace-python@020b51d24f0d187982b1b0929bd64f8f2e9bb161 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/latenceainew
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@020b51d24f0d187982b1b0929bd64f8f2e9bb161 -
Trigger Event:
release
-
Statement type: