Official Python SDK for Latence API
Project description
Latence Python SDK
Catch hallucinations, drift, and unused context before your users do.
Groundedness scoring for RAG pipelines and AI coding agents, with a one-call path to upgrade data quality — from messy input files to fully generated markdown and knowledge graphs — as well as a high-performance retrieval engine (OSS).
Charge your RAG pipelines and harnesses based on real data.
Quickstart • Trace • Upgrade Data Quality • Upgrade Retrieval • Trace Reference • Full Tutorial
Quickstart
pip install latence
export LATENCE_API_KEY="lat_..."
from latence import Latence
client = Latence() # reads LATENCE_API_KEY from the environment
r = client.experimental.trace.rag(
response_text="Paris is the capital of France.",
raw_context="France's capital city is Paris.",
)
print(r.score, r.band, r.context_coverage_ratio, r.context_unused_ratio)
That's it. You now know whether the answer was grounded, how much of your retrieved context was actually used, and whether to trust it.
Step 1 — Trace your answers
Three lanes, one mental model. Pick the one that matches what your app is doing right now.
RAG groundedness — did the answer actually come from your context?
from latence import Latence
client = Latence()
r = client.experimental.trace.rag(
response_text="Paris is the capital of France.",
raw_context="France's capital city is Paris.",
)
print(r.score) # 0.0 - 1.0
print(r.band) # "green" | "amber" | "red" | "unknown"
print(r.context_coverage_ratio) # how much of the answer is grounded in context
print(r.context_unused_ratio) # how much retrieved context was dead weight
Code agents — catch phantom APIs and drift turn-over-turn
Chain turns with the opaque next_session_state handoff. The SDK never forces you to track session internals.
turn1 = client.experimental.trace.code(
response_text="def add(a, b): return a + b",
raw_context="# utils.py\ndef sub(a, b): return a - b",
response_language_hint="python",
)
turn2 = client.experimental.trace.code(
response_text="def mul(a, b): return a * b",
raw_context="# utils.py\ndef sub(a, b): return a - b",
response_language_hint="python",
session_state=turn1.next_session_state, # chain turns
)
print(turn2.band)
print(turn2.session_signals.recommendation) # "continue" | "re_anchor" | "fresh_chat"
Hosted Trace pricing is $0.008/request by default. For higher-cost quality
mode, pass profile="quality" to trace.rag(...) or trace.code(...);
quality requests bill at $0.016/request.
Session rollup — one scoreboard for a live session
Stateless, CPU-only, sub-ms on the pod. Safe to call on every keystroke.
rollup = client.experimental.trace.rollup(turns=[turn1, turn2])
print(rollup.noise_pct) # fraction of turns flagged as noise
print(rollup.retrieval_waste_pct) # fraction of retrieved context left unused
print(rollup.model_drift_pct) # fraction of turns with drift
print(rollup.reason_code_histogram) # why the turns failed, aggregated
print(rollup.risk_band_trail) # per-turn band, chronological
print(rollup.recommendations) # actionable session-level advice
What the signals tell you to do next
The numbers above are not diagnostics. They are routing rules:
| Signal | Meaning | Next step |
|---|---|---|
band amber/red, low context_coverage_ratio |
The answer isn't grounded in what you retrieved. | Upgrade data quality — your upstream documents are the bottleneck. |
High context_unused_ratio, retrieval_waste_pct > 30% |
You retrieved the wrong chunks. | Upgrade retrieval — your retriever is the bottleneck. |
session_signals.recommendation = "re_anchor" / "fresh_chat" on the code lane |
Session drift is compounding. | Reset the agent's context on the next turn. |
Full reference: Trace docs and SDK tutorial §18.
Async
Every method above has an await-able twin under AsyncLatence:
from latence import AsyncLatence
async with AsyncLatence() as client:
r = await client.experimental.trace.rag(
response_text="Paris is the capital of France.",
raw_context="France's capital city is Paris.",
)
Step 2 — Upgrade data quality
Trace is showing low coverage or amber/red bands? The model is rarely the problem. It's usually the upstream data: un-OCR'd PDFs, missing entities, unresolved references. The Latence Data Intelligence Pipeline cleans that in one call.
job = client.pipeline.run(files=["contract.pdf"])
pkg = job.wait_for_completion()
print(pkg.document.markdown) # clean markdown
print(pkg.entities.summary) # {"total": 142, "by_type": {...}}
print(pkg.knowledge_graph.summary.total_relations) # 87
pkg.download_archive("./results.zip")
Smart defaults: OCR → entity extraction → relation extraction. Configure any step explicitly:
job = client.pipeline.run(
files=["contract.pdf"],
steps={
"ocr": {"mode": "performance"},
"redaction": {"mode": "balanced", "redact": True},
"extraction": {"label_mode": "hybrid", "threshold": 0.3},
"relation_extraction": {"resolve_entities": True},
},
)
Every run returns a structured DataPackage:
pkg.document— markdown + per-page layout (OCR)pkg.entities— entity list + summary (extraction)pkg.knowledge_graph— entities + relations + graph summary (relation extraction)pkg.redaction— cleaned text + PII list (redaction)pkg.compression— compressed text + ratio (compression)pkg.quality— per-stage confidence, latency, cost
Power users: the typed PipelineBuilder accepts YAML and validates client-side. See docs/pipelines.md for the full orchestration reference (DAG execution, resumable jobs, progress callbacks).
Corpus-level: Dataset Intelligence
Feed pipeline outputs into client.experimental.dataset_intelligence_service to build corpus-wide knowledge graphs, ontologies, and enriched feature spaces with incremental ingestion:
| Tier | Method | What it does |
|---|---|---|
| 1 | di.enrich() |
Semantic feature vectors (CPU-only, fast) |
| 2 | di.build_graph() |
Entity resolution, knowledge graph, link prediction |
| 3 | di.build_ontology() |
Concept clustering, hierarchy induction |
| Full | di.run() |
All three tiers sequentially |
See docs/dataset_intelligence.md.
Step 3 — Upgrade retrieval
If Trace keeps flagging a high context_unused_ratio, or the session rollup shows retrieval_waste_pct > 30%, your model isn't the problem — your retrieval engine is shipping the wrong chunks.
→ ColSearch — High Performance Late Interaction and multimodal search engine
ColSearch is our late-interaction retrieval engine: token-level ColBERT recall, native multimodal search over PDFs and images, and a drop-in replacement for the retrieval step in your RAG stack. Wire it in and context_unused_ratio collapses.
Error handling
from latence import (
LatenceError, AuthenticationError, InsufficientCreditsError,
RateLimitError, JobError, JobTimeoutError, TransportError,
)
try:
r = client.experimental.trace.rag(
response_text="Paris is the capital of France.",
raw_context="France's capital city is Paris.",
)
except AuthenticationError:
... # 401
except InsufficientCreditsError:
... # 402
except RateLimitError as e:
... # 429, retry after e.retry_after
except JobError as e:
... # pipeline job failed; check e.is_resumable
except TransportError:
... # network / DNS
The SDK retries on 429 and 5xx with exponential backoff (default 2 retries, respects Retry-After).
Configuration
export LATENCE_API_KEY="lat_your_key"
from latence import Latence
import latence
client = Latence(
api_key="lat_...", # or LATENCE_API_KEY env var
base_url="https://...", # or LATENCE_BASE_URL env var
timeout=60.0, # request timeout (default: 60s)
max_retries=2, # retry attempts (default: 2)
)
latence.setup_logging("DEBUG") # logs every HTTP request/response
Resources
| Trace reference | docs/trace.md — parameters and full response schema |
| Full tutorial | SDK_TUTORIAL.md — every service, every parameter |
| API docs | docs.latence.ai |
| Portal | app.latence.ai |
MIT License • latence.ai
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file latence-0.1.3.tar.gz.
File metadata
- Download URL: latence-0.1.3.tar.gz
- Upload date:
- Size: 211.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d259838cd9c57917c6568ce0c757337c7c17f06ad4114f10e7029e7cc2b1a97b
|
|
| MD5 |
a3e0a456ec35ac393bd1c766e531173a
|
|
| BLAKE2b-256 |
852fce6c657ec148de09b385852a500950b237cd57c67bc2d58c4b993118dd99
|
Provenance
The following attestation bundles were made for latence-0.1.3.tar.gz:
Publisher:
publish.yml on latenceainew/latence-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latence-0.1.3.tar.gz -
Subject digest:
d259838cd9c57917c6568ce0c757337c7c17f06ad4114f10e7029e7cc2b1a97b - Sigstore transparency entry: 1396370642
- Sigstore integration time:
-
Permalink:
latenceainew/latence-python@86765d05a05791b748bd796807b03e7517359b99 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/latenceainew
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@86765d05a05791b748bd796807b03e7517359b99 -
Trigger Event:
release
-
Statement type:
File details
Details for the file latence-0.1.3-py3-none-any.whl.
File metadata
- Download URL: latence-0.1.3-py3-none-any.whl
- Upload date:
- Size: 119.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c72f607b4cd837b94abc7becb3201d047e8bc6e254a1d7c0f14954be6374b1ee
|
|
| MD5 |
18888f0af61d29fb4af55e240277d262
|
|
| BLAKE2b-256 |
4c6671449efec30ed9401f0f6b988ecf5ba9188ff2d1bdcc16cf270ea99a39e0
|
Provenance
The following attestation bundles were made for latence-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on latenceainew/latence-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latence-0.1.3-py3-none-any.whl -
Subject digest:
c72f607b4cd837b94abc7becb3201d047e8bc6e254a1d7c0f14954be6374b1ee - Sigstore transparency entry: 1396370650
- Sigstore integration time:
-
Permalink:
latenceainew/latence-python@86765d05a05791b748bd796807b03e7517359b99 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/latenceainew
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@86765d05a05791b748bd796807b03e7517359b99 -
Trigger Event:
release
-
Statement type: