Skip to main content

Official Python SDK for Latence AI API

Project description

Latence AI

Latence AI Python SDK

Catch hallucinations, drift, and unused context before your users do.
Groundedness scoring for RAG pipelines and AI coding agents, with a one-call path to upgrade data quality — from messy input files to fully generated markdown and knowledge graphs — as well as a high-performance retrieval engine (OSS).

Charge your RAG pipelines and harnesses based on real data.

PyPI Python License

QuickstartTraceUpgrade Data QualityUpgrade RetrievalTrace ReferenceFull Tutorial


Quickstart

pip install latence
export LATENCE_API_KEY="lat_..."
from latence import Latence

client = Latence()  # reads LATENCE_API_KEY from the environment

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)
print(r.score, r.band, r.context_coverage_ratio, r.context_unused_ratio)

That's it. You now know whether the answer was grounded, how much of your retrieved context was actually used, and whether to trust it.


Step 1 — Trace your answers

Three lanes, one mental model. Pick the one that matches what your app is doing right now.

RAG groundedness — did the answer actually come from your context?

from latence import Latence

client = Latence()

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)

print(r.score)                   # 0.0 - 1.0
print(r.band)                    # "green" | "amber" | "red" | "unknown"
print(r.context_coverage_ratio)  # how much of the answer is grounded in context
print(r.context_unused_ratio)    # how much retrieved context was dead weight

Code agents — catch phantom APIs and drift turn-over-turn

Chain turns with the opaque next_session_state handoff. The SDK never forces you to track session internals.

turn1 = client.experimental.trace.code(
    response_text="def add(a, b): return a + b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
)

turn2 = client.experimental.trace.code(
    response_text="def mul(a, b): return a * b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
    session_state=turn1.next_session_state,   # chain turns
)

print(turn2.band)
print(turn2.session_signals.recommendation)   # "continue" | "re_anchor" | "fresh_chat"

Session rollup — one scoreboard for a live session

Stateless, CPU-only, sub-ms on the pod. Safe to call on every keystroke.

rollup = client.experimental.trace.rollup(turns=[turn1, turn2])

print(rollup.noise_pct)              # fraction of turns flagged as noise
print(rollup.retrieval_waste_pct)    # fraction of retrieved context left unused
print(rollup.model_drift_pct)        # fraction of turns with drift
print(rollup.reason_code_histogram)  # why the turns failed, aggregated
print(rollup.risk_band_trail)        # per-turn band, chronological
print(rollup.recommendations)        # actionable session-level advice

What the signals tell you to do next

The numbers above are not diagnostics. They are routing rules:

Signal Meaning Next step
band amber/red, low context_coverage_ratio The answer isn't grounded in what you retrieved. Upgrade data quality — your upstream documents are the bottleneck.
High context_unused_ratio, retrieval_waste_pct > 30% You retrieved the wrong chunks. Upgrade retrieval — your retriever is the bottleneck.
session_signals.recommendation = "re_anchor" / "fresh_chat" on the code lane Session drift is compounding. Reset the agent's context on the next turn.

Full reference: Trace docs and SDK tutorial §18.

Async

Every method above has an await-able twin under AsyncLatence:

from latence import AsyncLatence

async with AsyncLatence() as client:
    r = await client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )

Step 2 — Upgrade data quality

Trace is showing low coverage or amber/red bands? The model is rarely the problem. It's usually the upstream data: un-OCR'd PDFs, missing entities, unresolved references. The Latence Data Intelligence Pipeline cleans that in one call.

job = client.pipeline.run(files=["contract.pdf"])
pkg = job.wait_for_completion()

print(pkg.document.markdown)                         # clean markdown
print(pkg.entities.summary)                          # {"total": 142, "by_type": {...}}
print(pkg.knowledge_graph.summary.total_relations)   # 87
pkg.download_archive("./results.zip")

Smart defaults: OCR → entity extraction → relation extraction. Configure any step explicitly:

job = client.pipeline.run(
    files=["contract.pdf"],
    steps={
        "ocr": {"mode": "performance"},
        "redaction": {"mode": "balanced", "redact": True},
        "extraction": {"label_mode": "hybrid", "threshold": 0.3},
        "relation_extraction": {"resolve_entities": True},
    },
)

Every run returns a structured DataPackage:

  • pkg.document — markdown + per-page layout (OCR)
  • pkg.entities — entity list + summary (extraction)
  • pkg.knowledge_graph — entities + relations + graph summary (relation extraction)
  • pkg.redaction — cleaned text + PII list (redaction)
  • pkg.compression — compressed text + ratio (compression)
  • pkg.quality — per-stage confidence, latency, cost

Power users: the typed PipelineBuilder accepts YAML and validates client-side. See docs/pipelines.md for the full orchestration reference (DAG execution, resumable jobs, progress callbacks).

Corpus-level: Dataset Intelligence

Feed pipeline outputs into client.experimental.dataset_intelligence_service to build corpus-wide knowledge graphs, ontologies, and enriched feature spaces with incremental ingestion:

Tier Method What it does
1 di.enrich() Semantic feature vectors (CPU-only, fast)
2 di.build_graph() Entity resolution, knowledge graph, link prediction
3 di.build_ontology() Concept clustering, hierarchy induction
Full di.run() All three tiers sequentially

See docs/dataset_intelligence.md.


Step 3 — Upgrade retrieval

If Trace keeps flagging a high context_unused_ratio, or the session rollup shows retrieval_waste_pct > 30%, your model isn't the problem — your retrieval engine is shipping the wrong chunks.

ColSearch — High Performance Late Interaction and multimodal search engine

ColSearch is our late-interaction retrieval engine: token-level ColBERT recall, native multimodal search over PDFs and images, and a drop-in replacement for the retrieval step in your RAG stack. Wire it in and context_unused_ratio collapses.


Error handling

from latence import (
    LatenceError, AuthenticationError, InsufficientCreditsError,
    RateLimitError, JobError, JobTimeoutError, TransportError,
)

try:
    r = client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )
except AuthenticationError:
    ...  # 401
except InsufficientCreditsError:
    ...  # 402
except RateLimitError as e:
    ...  # 429, retry after e.retry_after
except JobError as e:
    ...  # pipeline job failed; check e.is_resumable
except TransportError:
    ...  # network / DNS

The SDK retries on 429 and 5xx with exponential backoff (default 2 retries, respects Retry-After).


Configuration

export LATENCE_API_KEY="lat_your_key"
from latence import Latence
import latence

client = Latence(
    api_key="lat_...",       # or LATENCE_API_KEY env var
    base_url="https://...",  # or LATENCE_BASE_URL env var
    timeout=60.0,            # request timeout (default: 60s)
    max_retries=2,           # retry attempts (default: 2)
)

latence.setup_logging("DEBUG")  # logs every HTTP request/response

Resources

Trace reference docs/trace.md — parameters and full response schema
Full tutorial SDK_TUTORIAL.md — every service, every parameter
API docs docs.latence.ai
Portal app.latence.ai

MIT License • latence.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latence-0.1.2.tar.gz (207.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

latence-0.1.2-py3-none-any.whl (118.5 kB view details)

Uploaded Python 3

latence-0.1.2-2-py3-none-any.whl (119.1 kB view details)

Uploaded Python 3

latence-0.1.2-1-py3-none-any.whl (119.0 kB view details)

Uploaded Python 3

File details

Details for the file latence-0.1.2.tar.gz.

File metadata

  • Download URL: latence-0.1.2.tar.gz
  • Upload date:
  • Size: 207.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latence-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a884a25ce70bea1385f8bf6d332ef25d728b596a6e73f80e6bb2266e57fa91b3
MD5 f3c5000cb8aa6aa11e48706fabd87294
BLAKE2b-256 db9ac8ebb5b1db51a916f13775979b8f6bffb0e872d87f19fdccb7b53fd09944

See more details on using hashes here.

Provenance

The following attestation bundles were made for latence-0.1.2.tar.gz:

Publisher: publish.yml on latenceainew/latence-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latence-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: latence-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 118.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latence-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 94989e1ea789c6d2020e8f1ddb2a6e84c6b11d878c9a7632352660d90b6f048f
MD5 4e3f80b100b0b5c61b2f3375e48445e4
BLAKE2b-256 a2f5a1023aef6d8e5fac46346456483747863313a35d1a5d8a765ed329c13b6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for latence-0.1.2-py3-none-any.whl:

Publisher: publish.yml on latenceainew/latence-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latence-0.1.2-2-py3-none-any.whl.

File metadata

  • Download URL: latence-0.1.2-2-py3-none-any.whl
  • Upload date:
  • Size: 119.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latence-0.1.2-2-py3-none-any.whl
Algorithm Hash digest
SHA256 ce974571cb6112142656f9feccdd12757a4a37053c67cc5931ce484da00a58d7
MD5 9c2558ae2a9ba9f190a6bf4af01583c3
BLAKE2b-256 2da8a5bd38490e00e2fa86ae352292df2ba7b1d469b69c89b100f0b493fcd0a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for latence-0.1.2-2-py3-none-any.whl:

Publisher: publish.yml on latenceainew/latence-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latence-0.1.2-1-py3-none-any.whl.

File metadata

  • Download URL: latence-0.1.2-1-py3-none-any.whl
  • Upload date:
  • Size: 119.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latence-0.1.2-1-py3-none-any.whl
Algorithm Hash digest
SHA256 33377fcaebc3cc975b16ab5addc937b0e7b67c2d49c77033ec28255ae8de7dcb
MD5 c74e551400161e0aed110fbb64c915cc
BLAKE2b-256 2ccc8a15b9202ead5f033ccbbf764f0e0ceee28d73435c2cf69c6f04c759c588

See more details on using hashes here.

Provenance

The following attestation bundles were made for latence-0.1.2-1-py3-none-any.whl:

Publisher: publish.yml on latenceainew/latence-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page