Skip to main content

Official Python SDK for Latence API

Project description

Latence

Latence Python SDK

Catch hallucinations, drift, and unused context before your users do.
Groundedness scoring for RAG pipelines and AI coding agents, with a one-call path to upgrade data quality — from messy input files to fully generated markdown and knowledge graphs — as well as a high-performance retrieval engine (OSS).

Charge your RAG pipelines and harnesses based on real data.

PyPI Python License

QuickstartTraceUpgrade Data QualityUpgrade RetrievalTrace ReferenceFull Tutorial


Quickstart

pip install latence
export LATENCE_API_KEY="lat_..."
from latence import Latence

client = Latence()  # reads LATENCE_API_KEY from the environment

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)
print(r.score, r.band, r.context_coverage_ratio, r.context_unused_ratio)

That's it. You now know whether the answer was grounded, how much of your retrieved context was actually used, and whether to trust it.


Step 1 — Trace your answers

Three lanes, one mental model. Pick the one that matches what your app is doing right now.

RAG groundedness — did the answer actually come from your context?

from latence import Latence

client = Latence()

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)

print(r.score)                   # 0.0 - 1.0
print(r.band)                    # "green" | "amber" | "red" | "unknown"
print(r.context_coverage_ratio)  # how much of the answer is grounded in context
print(r.context_unused_ratio)    # how much retrieved context was dead weight

Code agents — catch phantom APIs and drift turn-over-turn

Chain turns with the opaque next_session_state handoff. The SDK never forces you to track session internals.

turn1 = client.experimental.trace.code(
    response_text="def add(a, b): return a + b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
)

turn2 = client.experimental.trace.code(
    response_text="def mul(a, b): return a * b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
    session_state=turn1.next_session_state,   # chain turns
)

print(turn2.band)
print(turn2.session_signals.recommendation)   # "continue" | "re_anchor" | "fresh_chat"

Hosted Trace pricing is $0.008/request by default. For higher-cost quality mode, pass profile="quality" to trace.rag(...) or trace.code(...); quality requests bill at $0.016/request.

Session rollup — one scoreboard for a live session

Stateless, CPU-only, sub-ms on the pod. Safe to call on every keystroke.

rollup = client.experimental.trace.rollup(turns=[turn1, turn2])

print(rollup.noise_pct)              # fraction of turns flagged as noise
print(rollup.retrieval_waste_pct)    # fraction of retrieved context left unused
print(rollup.model_drift_pct)        # fraction of turns with drift
print(rollup.reason_code_histogram)  # why the turns failed, aggregated
print(rollup.risk_band_trail)        # per-turn band, chronological
print(rollup.recommendations)        # actionable session-level advice

What the signals tell you to do next

The numbers above are not diagnostics. They are routing rules:

Signal Meaning Next step
band amber/red, low context_coverage_ratio The answer isn't grounded in what you retrieved. Upgrade data quality — your upstream documents are the bottleneck.
High context_unused_ratio, retrieval_waste_pct > 30% You retrieved the wrong chunks. Upgrade retrieval — your retriever is the bottleneck.
session_signals.recommendation = "re_anchor" / "fresh_chat" on the code lane Session drift is compounding. Reset the agent's context on the next turn.

Full reference: Trace docs and SDK tutorial §18.

Async

Every method above has an await-able twin under AsyncLatence:

from latence import AsyncLatence

async with AsyncLatence() as client:
    r = await client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )

Step 2 — Upgrade data quality

Trace is showing low coverage or amber/red bands? The model is rarely the problem. It's usually the upstream data: un-OCR'd PDFs, missing entities, unresolved references. The Latence Data Intelligence Pipeline cleans that in one call.

job = client.pipeline.run(files=["contract.pdf"])
pkg = job.wait_for_completion()

print(pkg.document.markdown)                         # clean markdown
print(pkg.entities.summary)                          # {"total": 142, "by_type": {...}}
print(pkg.knowledge_graph.summary.total_relations)   # 87
pkg.download_archive("./results.zip")

Smart defaults: OCR → entity extraction → relation extraction. Configure any step explicitly:

job = client.pipeline.run(
    files=["contract.pdf"],
    steps={
        "ocr": {"mode": "performance"},
        "redaction": {"mode": "balanced", "redact": True},
        "extraction": {"label_mode": "hybrid", "threshold": 0.3},
        "relation_extraction": {"resolve_entities": True},
    },
)

Every run returns a structured DataPackage:

  • pkg.document — markdown + per-page layout (OCR)
  • pkg.entities — entity list + summary (extraction)
  • pkg.knowledge_graph — entities + relations + graph summary (relation extraction)
  • pkg.redaction — cleaned text + PII list (redaction)
  • pkg.compression — compressed text + ratio (compression)
  • pkg.quality — per-stage confidence, latency, cost

Power users: the typed PipelineBuilder accepts YAML and validates client-side. See docs/pipelines.md for the full orchestration reference (DAG execution, resumable jobs, progress callbacks).

Corpus-level: Dataset Intelligence

Feed pipeline outputs into client.experimental.dataset_intelligence_service to build corpus-wide knowledge graphs, ontologies, and enriched feature spaces with incremental ingestion:

Tier Method What it does
1 di.enrich() Semantic feature vectors (CPU-only, fast)
2 di.build_graph() Entity resolution, knowledge graph, link prediction
3 di.build_ontology() Concept clustering, hierarchy induction
Full di.run() All three tiers sequentially

See docs/dataset_intelligence.md.


Step 3 — Upgrade retrieval

If Trace keeps flagging a high context_unused_ratio, or the session rollup shows retrieval_waste_pct > 30%, your model isn't the problem — your retrieval engine is shipping the wrong chunks.

ColSearch — High Performance Late Interaction and multimodal search engine

ColSearch is our late-interaction retrieval engine: token-level ColBERT recall, native multimodal search over PDFs and images, and a drop-in replacement for the retrieval step in your RAG stack. Wire it in and context_unused_ratio collapses.


Error handling

from latence import (
    LatenceError, AuthenticationError, InsufficientCreditsError,
    RateLimitError, JobError, JobTimeoutError, TransportError,
)

try:
    r = client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )
except AuthenticationError:
    ...  # 401
except InsufficientCreditsError:
    ...  # 402
except RateLimitError as e:
    ...  # 429, retry after e.retry_after
except JobError as e:
    ...  # pipeline job failed; check e.is_resumable
except TransportError:
    ...  # network / DNS

The SDK retries on 429 and 5xx with exponential backoff (default 2 retries, respects Retry-After).


Configuration

export LATENCE_API_KEY="lat_your_key"
from latence import Latence
import latence

client = Latence(
    api_key="lat_...",       # or LATENCE_API_KEY env var
    base_url="https://...",  # or LATENCE_BASE_URL env var
    timeout=60.0,            # request timeout (default: 60s)
    max_retries=2,           # retry attempts (default: 2)
)

latence.setup_logging("DEBUG")  # logs every HTTP request/response

Resources

Trace reference docs/trace.md — parameters and full response schema
Full tutorial SDK_TUTORIAL.md — every service, every parameter
API docs docs.latence.ai
Portal app.latence.ai

MIT License • latence.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latence-0.1.3.tar.gz (211.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latence-0.1.3-py3-none-any.whl (119.4 kB view details)

Uploaded Python 3

File details

Details for the file latence-0.1.3.tar.gz.

File metadata

  • Download URL: latence-0.1.3.tar.gz
  • Upload date:
  • Size: 211.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latence-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d259838cd9c57917c6568ce0c757337c7c17f06ad4114f10e7029e7cc2b1a97b
MD5 a3e0a456ec35ac393bd1c766e531173a
BLAKE2b-256 852fce6c657ec148de09b385852a500950b237cd57c67bc2d58c4b993118dd99

See more details on using hashes here.

Provenance

The following attestation bundles were made for latence-0.1.3.tar.gz:

Publisher: publish.yml on latenceainew/latence-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latence-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: latence-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 119.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latence-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c72f607b4cd837b94abc7becb3201d047e8bc6e254a1d7c0f14954be6374b1ee
MD5 18888f0af61d29fb4af55e240277d262
BLAKE2b-256 4c6671449efec30ed9401f0f6b988ecf5ba9188ff2d1bdcc16cf270ea99a39e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for latence-0.1.3-py3-none-any.whl:

Publisher: publish.yml on latenceainew/latence-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page