Drop-in Tessera integration for LlamaIndex. One line of config routes your existing OpenAI / Anthropic / Mistral / Groq / Cohere LLM through Tessera's auto-route + auto-cache + auto-compress + auto-batch proxy.

These details have not been verified by PyPI

Project links

Project description

`tessera-llamaindex`

Drop-in cost optimization for LlamaIndex. One line of config routes your existing OpenAI / Anthropic / MistralAI / Groq / Cohere LLM through the Tessera optimization proxy — auto-route to cheaper-equivalent models, exact + provider-prompt-cache hits, prompt compression with per-stack quality canary, batch arbitrage on async-tolerant calls. Free Sandbox tier: 60M tokens/month, no card. Paid tiers: flat monthly subscription by token volume, keep 100% of savings.

Companion to tessera-sdk (vanilla provider SDKs), tessera-langchain (LangChain integration), tessera-vercel-ai (Vercel AI SDK integration), tessera-mastra (Mastra Agent framework integration), tessera-pydantic-ai (Pydantic AI integration), tessera-crewai (CrewAI multi-agent integration), and tessera-autogen (AutoGen 0.4+ multi-agent integration). Same proxy, same mechanic stack, LlamaIndex-shaped API.

Install

pip install tessera-llamaindex
# Plus whichever LlamaIndex provider package you use:
pip install llama-index-llms-openai          # or llama-index-llms-anthropic / -mistralai / -groq / -cohere

Get a free Tessera API key (60M tokens/mo, no card) — tesseraai.io/dev.

Quickstart

from llama_index.llms.openai import OpenAI
from tessera_llamaindex import tessera_openai_config

llm = OpenAI(
    model="gpt-4o",
    api_key="sk-...",                              # your OpenAI key, unchanged
    **tessera_openai_config(api_key="tk_..."),    # one line, routes through Tessera
)

# Existing LlamaIndex code (queries, RAG pipelines, agents, sub-question
# engines, multi-step reasoning) runs unchanged.
response = llm.complete("Summarize this document in 3 bullets.")

Same pattern for the other 4 providers:

from llama_index.llms.anthropic import Anthropic
from tessera_llamaindex import tessera_anthropic_config

llm = Anthropic(
    model="claude-sonnet-4-5-20250929",
    api_key="sk-ant-...",
    **tessera_anthropic_config(api_key="tk_..."),
)

Provider support — verified constructor signatures

Field names runtime-verified against installed LlamaIndex 0.6+ provider packages:

Provider	Tessera config function	LlamaIndex class	URL param	Headers approach
OpenAI	`tessera_openai_config`	`llama_index.llms.openai.OpenAI`	`api_base`	`default_headers`
Anthropic	`tessera_anthropic_config`	`llama_index.llms.anthropic.Anthropic`	`base_url`	`default_headers`
Mistral	`tessera_mistral_config`	`llama_index.llms.mistralai.MistralAI`	`endpoint`	`additional_kwargs.http_headers`
Groq	`tessera_groq_config`	`llama_index.llms.groq.Groq`	`api_base`	`default_headers` (via OpenAILike inheritance)
Cohere	`tessera_cohere_config`	`llama_index.llms.cohere.Cohere`	`base_url`	`additional_kwargs.headers`

Generic dispatcher: tessera_config(provider, api_key=...) returns the right kwargs dict regardless of provider.

Each constructor-shape is locked into CI via tests/test_e2e.py — if a future LlamaIndex release changes the kwargs an __init__ accepts, the regression fails before we ship.

What Tessera does on every request

Same mechanic stack as the main tessera-sdk. Each mechanic is opt-in per workload, observable per request, and bypasses when its quality canary drops below the per-stack 0.95 floor.

Mechanic	What it does	Typical savings
Auto-route _(m1)	Route to a cheaper-equivalent model gated by a daily promptfoo canary on your eval set	15–35% on routed calls
Auto-cache _(m2)	sha256 cache on the canonical request body, 7-day TTL	5–40% depending on prompt repetition
Auto-compress _(m3)	Per-role heuristic compression (system + user toggles independent)	5–15% on prompt tokens
Prompt cache _(m6)	Inject provider-native cache headers — OpenAI 50% off, Anthropic 90% off cache reads	50–90% on cached prefixes
Context prune _(m7)	Conservative trim on long conversations + RAG attachments	5–25% on multi-turn workloads
Output-length ceiling _(m9)	Daily compute fits p90 of completion length per workload	5–15% on completion cost
Batch arbitrage _(m10)	Route async-tolerant calls to provider Batch APIs (50% off)	50% on batch-eligible traffic
Per-provider circuit breaker	(Reliability primitive.) Rolling 5xx-rate state machine per upstream.	n/a — keeps the savings stack honest

Pricing

Free Sandbox — 60M tokens/month, 30 requests/minute, observability-only mechanics, no card. Forever.
Paid tiers — flat monthly subscription by token volume: Starter $199 (≤1B), Growth $999 (≤5B), Scale $3,999 (≤20B), Enterprise custom (20B+). You keep 100% of measured savings.

Existing customers of the other Tessera packages keep their rate_locked_pct (if any) on this package — same tk_… key, same billing record.

FAQ

Q: How is this different from `tessera-sdk`, `tessera-langchain`, `tessera-vercel-ai`?

Same proxy, same mechanics, same billing. The four packages target different code surfaces:

tessera-sdk — patches provider SDK constructors directly (OpenAI, Anthropic, etc.) via one-line tessera.activate(key). Use when calling provider SDKs without a framework.
tessera-langchain — wires into LangChain ChatModel constructors.
tessera-vercel-ai — wires into Vercel AI SDK createX provider factories.
tessera-llamaindex (this package) — wires into LlamaIndex llama_index.llms.* LLM constructors.

Pick whichever fits your codebase. Side-by-side install is supported.

Q: Does this break my RAG pipeline / query engine / agent?

No. The LlamaIndex LLM object behaves identically — complete, chat, stream_complete, stream_chat all work unchanged. Index queries, retrievers, sub-question engines, OpenAI Agents, multi-step reasoning chains all use the LLM's standard complete/chat interface and route through Tessera transparently.

Q: What happens if Tessera's proxy is down?

Your application gets HTTP errors instead of LLM responses. On the proxy side, a per-provider circuit breaker tracks rolling 5xx rates and skips degraded providers in auto-route decisions. Cross-provider failover (re-routing to a different provider entirely when an upstream is down) is on the roadmap, not shipped yet.

Q: What happens to my OpenAI / Anthropic rate limits?

They pass through. Tessera does not aggregate quotas across customers. Your provider rate limits apply normally; the proxy enforces only the Tessera tier limits (30 rpm Free Sandbox, 60 rpm Production by default — higher on request).

Q: Are you storing my prompts and completions?

No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on tesseraai.io/security.

Q: Why does Mistral use `additional_kwargs.http_headers` instead of `default_headers`?

LlamaIndex's MistralAI wrapper doesn't expose a top-level default_headers argument — it forwards additional_kwargs.http_headers to the underlying mistralai SDK on each request. The Tessera Mistral config function returns the correct shape for this. You don't need to know this; the config function abstracts it. Same story for Cohere (additional_kwargs.headers).

Q: Can I use this with LlamaIndex's `Settings.llm = ...` global pattern?

Yes — just construct the LLM the same way and assign it:

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from tessera_llamaindex import tessera_openai_config

Settings.llm = OpenAI(model="gpt-4o", api_key="sk-...", **tessera_openai_config(api_key="tk_..."))

Architecture

Open-source SDK ↔ closed-source proxy. This package is a thin client that adds one HTTP hop. The actual mechanic decisions run inside the Tessera Cloudflare Worker proxy at api.tesseraai.io. The wire format is open; the mechanic implementations are closed.

License

Apache-2.0. See LICENSE.

Versioning

Semver. Wire format compatibility committed across minor releases; breaking changes only on major bumps.

Security

Coordinated disclosure address: security@tesseraai.io.

About Tessera

Tessera is the substrate layer for LLM cost optimization, also called the Optimize Layer in our product surface. A thin proxy that sits in your application's request-path, applies a conservative cascade of optimization mechanics, and measures every saved dollar against an audit-immutable baseline. We bill a flat monthly subscription by token volume (Starter $199, Growth $999, Scale $3,999, Enterprise custom); you keep 100% of measured savings. No per-token gateway fee; the category we operate in is "LLM cost optimizer," distinct from per-token AI gateways and observability dashboards.

Where observability tools tell you what you spent and AI gateways re-shape the request without measuring the cost delta, Tessera is the layer that does both, and proves the measured savings line by line. The verified-savings ledger at ledger.tesseraai.io shows every original-vs-actual cost pair, snapshot-pinned to a pricing_catalog version captured at request time. Mid-contract price changes don't retroactively alter past savings. This is the FinOps-friendly model for AI inference: every line of the bill traces to a code-enforced rule.

Operated by Fintechagency OÜ (Tallinn, Estonia, registry code 16638667).

Developer entry: tesseraai.io/dev
Mechanic reference: tesseraai.io/how-it-works
Dashboard: ledger.tesseraai.io
Engineering blog: tesseraai.io/blog

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

May 28, 2026

0.1.1

May 24, 2026

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tessera_llamaindex-0.1.2.tar.gz (21.7 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tessera_llamaindex-0.1.2-py3-none-any.whl (12.8 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file tessera_llamaindex-0.1.2.tar.gz.

File metadata

Download URL: tessera_llamaindex-0.1.2.tar.gz
Upload date: May 28, 2026
Size: 21.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tessera_llamaindex-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`ca5fab59bd6811e58f44e48e04a7019b897df7082ae6a0dee6518fd8bea205ae`
MD5	`ec17060c054a48b5ae6c4c25505cb6aa`
BLAKE2b-256	`ad85fdc6d6844a9396d1e7d62c6a92b6df091f80bdc44beba19b50118aa178f5`

See more details on using hashes here.

File details

Details for the file tessera_llamaindex-0.1.2-py3-none-any.whl.

File metadata

Download URL: tessera_llamaindex-0.1.2-py3-none-any.whl
Upload date: May 28, 2026
Size: 12.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tessera_llamaindex-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e6533c9dc2b9d27c442c42eedf2aa8e93150399417a9cc65756695e7ea78651`
MD5	`5e6a38d0ab69844e70db2016f0349c32`
BLAKE2b-256	`85d03b32b977b836226dd43aeab51895a1a5151ac8dc14f1ff0b87bfea30320f`

See more details on using hashes here.

tessera-llamaindex 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tessera-llamaindex

Install

Quickstart

Provider support — verified constructor signatures

What Tessera does on every request

Pricing

FAQ

Q: How is this different from tessera-sdk, tessera-langchain, tessera-vercel-ai?

Q: Does this break my RAG pipeline / query engine / agent?

Q: What happens if Tessera's proxy is down?

Q: What happens to my OpenAI / Anthropic rate limits?

Q: Are you storing my prompts and completions?

Q: Why does Mistral use additional_kwargs.http_headers instead of default_headers?

Q: Can I use this with LlamaIndex's Settings.llm = ... global pattern?

Architecture

License

Versioning

Security

About Tessera

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tessera-llamaindex`

Q: How is this different from `tessera-sdk`, `tessera-langchain`, `tessera-vercel-ai`?

Q: Why does Mistral use `additional_kwargs.http_headers` instead of `default_headers`?

Q: Can I use this with LlamaIndex's `Settings.llm = ...` global pattern?