Drop-in Tessera integration for LlamaIndex. One line of config routes your existing OpenAI / Anthropic / Mistral / Groq / Cohere LLM through Tessera's auto-route + auto-cache + auto-compress + auto-batch proxy. Free tier: 60M tokens/mo. Production: 20% of measured savings.
Project description
tessera-llamaindex
Drop-in cost optimization for LlamaIndex. One line of config routes your existing OpenAI / Anthropic / MistralAI / Groq / Cohere LLM through the Tessera optimization proxy — auto-route to cheaper-equivalent models, exact + provider-prompt-cache hits, prompt compression with per-stack quality canary, batch arbitrage on async-tolerant calls. Free Dev tier: 60M tokens/month, no card. Production: 20% of measured savings, $0 if we save you nothing.
Companion to tessera-sdk (vanilla provider SDKs), tessera-langchain (LangChain integration), tessera-vercel-ai (Vercel AI SDK integration), tessera-mastra (Mastra Agent framework integration), tessera-pydantic-ai (Pydantic AI integration), tessera-crewai (CrewAI multi-agent integration), and tessera-autogen (AutoGen 0.4+ multi-agent integration). Same proxy, same mechanic stack, LlamaIndex-shaped API.
Install
pip install tessera-llamaindex
# Plus whichever LlamaIndex provider package you use:
pip install llama-index-llms-openai # or llama-index-llms-anthropic / -mistralai / -groq / -cohere
Get a free Tessera API key (60M tokens/mo, no card) — tesseraai.io/dev.
Quickstart
from llama_index.llms.openai import OpenAI
from tessera_llamaindex import tessera_openai_config
llm = OpenAI(
model="gpt-4o",
api_key="sk-...", # your OpenAI key, unchanged
**tessera_openai_config(api_key="tsr_..."), # one line, routes through Tessera
)
# Existing LlamaIndex code (queries, RAG pipelines, agents, sub-question
# engines, multi-step reasoning) runs unchanged.
response = llm.complete("Summarize this document in 3 bullets.")
Same pattern for the other 4 providers:
from llama_index.llms.anthropic import Anthropic
from tessera_llamaindex import tessera_anthropic_config
llm = Anthropic(
model="claude-sonnet-4-5-20250929",
api_key="sk-ant-...",
**tessera_anthropic_config(api_key="tsr_..."),
)
Provider support — verified constructor signatures
Field names runtime-verified against installed LlamaIndex 0.6+ provider packages:
| Provider | Tessera config function | LlamaIndex class | URL param | Headers approach |
|---|---|---|---|---|
| OpenAI | tessera_openai_config |
llama_index.llms.openai.OpenAI |
api_base |
default_headers |
| Anthropic | tessera_anthropic_config |
llama_index.llms.anthropic.Anthropic |
base_url |
default_headers |
| Mistral | tessera_mistral_config |
llama_index.llms.mistralai.MistralAI |
endpoint |
additional_kwargs.http_headers |
| Groq | tessera_groq_config |
llama_index.llms.groq.Groq |
api_base |
default_headers (via OpenAILike inheritance) |
| Cohere | tessera_cohere_config |
llama_index.llms.cohere.Cohere |
base_url |
additional_kwargs.headers |
Generic dispatcher: tessera_config(provider, api_key=...) returns the right kwargs dict regardless of provider.
Each constructor-shape is locked into CI via tests/test_e2e.py — if a future LlamaIndex release changes the kwargs an __init__ accepts, the regression fails before we ship.
What Tessera does on every request
Same mechanic stack as the main tessera-sdk. Each mechanic is opt-in per workload, observable per request, and bypasses when its quality canary drops below the per-stack 0.95 floor.
| Mechanic | What it does | Typical savings |
|---|---|---|
| Auto-route (m1) | Route to a cheaper-equivalent model gated by a daily promptfoo canary on your eval set | 15–35% on routed calls |
| Auto-cache (m2) | sha256 cache on the canonical request body, 7-day TTL | 5–40% depending on prompt repetition |
| Auto-compress (m3) | Per-role heuristic compression (system + user toggles independent) | 5–15% on prompt tokens |
| Prompt cache (m6) | Inject provider-native cache headers — OpenAI 50% off, Anthropic 90% off cache reads | 50–90% on cached prefixes |
| Context prune (m7) | Conservative trim on long conversations + RAG attachments | 5–25% on multi-turn workloads |
| Output-length ceiling (m9) | Daily compute fits p90 of completion length per workload | 5–15% on completion cost |
| Batch arbitrage (m10) | Route async-tolerant calls to provider Batch APIs (50% off) | 50% on batch-eligible traffic |
| Per-provider circuit breaker | (Reliability primitive.) Rolling 5xx-rate state machine per upstream. | n/a — keeps the savings stack honest |
Pricing
- Free Dev — 60M tokens/month, 30 requests/minute, all mechanics on, no card. Forever.
- Production — over 60M tokens/month or higher rate limit. 20% of measured savings only. Zero savings, zero fee. Prepaid Stripe balance, $100 minimum top-up.
Existing customers of the other Tessera packages keep their rate_locked_pct (if any) on this package — same tsr_… key, same billing record.
FAQ
Q: How is this different from tessera-sdk, tessera-langchain, tessera-vercel-ai?
Same proxy, same mechanics, same billing. The four packages target different code surfaces:
tessera-sdk— patches provider SDK constructors directly (OpenAI, Anthropic, etc.) via one-linetessera.activate(key). Use when calling provider SDKs without a framework.tessera-langchain— wires into LangChainChatModelconstructors.tessera-vercel-ai— wires into Vercel AI SDKcreateXprovider factories.tessera-llamaindex(this package) — wires into LlamaIndexllama_index.llms.*LLM constructors.
Pick whichever fits your codebase. Side-by-side install is supported.
Q: Does this break my RAG pipeline / query engine / agent?
No. The LlamaIndex LLM object behaves identically — complete, chat, stream_complete, stream_chat all work unchanged. Index queries, retrievers, sub-question engines, OpenAI Agents, multi-step reasoning chains all use the LLM's standard complete/chat interface and route through Tessera transparently.
Q: What happens if Tessera's proxy is down?
Your application gets HTTP errors instead of LLM responses. On the proxy side, a per-provider circuit breaker tracks rolling 5xx rates and skips degraded providers in auto-route decisions. Cross-provider failover (re-routing to a different provider entirely when an upstream is down) is on the roadmap, not shipped yet.
Q: What happens to my OpenAI / Anthropic rate limits?
They pass through. Tessera does not aggregate quotas across customers. Your provider rate limits apply normally; the proxy enforces only the Tessera tier limits (30 rpm Free Dev, 60 rpm Production by default — higher on request).
Q: Are you storing my prompts and completions?
No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on tesseraai.io/security.
Q: Why does Mistral use additional_kwargs.http_headers instead of default_headers?
LlamaIndex's MistralAI wrapper doesn't expose a top-level default_headers argument — it forwards additional_kwargs.http_headers to the underlying mistralai SDK on each request. The Tessera Mistral config function returns the correct shape for this. You don't need to know this; the config function abstracts it. Same story for Cohere (additional_kwargs.headers).
Q: Can I use this with LlamaIndex's Settings.llm = ... global pattern?
Yes — just construct the LLM the same way and assign it:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from tessera_llamaindex import tessera_openai_config
Settings.llm = OpenAI(model="gpt-4o", api_key="sk-...", **tessera_openai_config(api_key="tsr_..."))
Architecture
Open-source SDK ↔ closed-source proxy. This package is a thin client that adds one HTTP hop. The actual mechanic decisions run inside the Tessera Cloudflare Worker proxy at api.tesseraai.io. The wire format is open; the mechanic implementations are closed.
License
Apache-2.0. See LICENSE.
Versioning
Semver. Wire format compatibility committed across minor releases; breaking changes only on major bumps.
Security
Coordinated disclosure address: security@tesseraai.io.
Built by Tessera — Fintechagency OÜ, Tallinn, Estonia (registry 16638667).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tessera_llamaindex-0.1.0.tar.gz.
File metadata
- Download URL: tessera_llamaindex-0.1.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9d577632fcfb65a30e217fbf22ddefe562939b4f8e8879d5f98acc3078faa7b
|
|
| MD5 |
8350a6b25e210ad0e63eec3ea404d37e
|
|
| BLAKE2b-256 |
c35114932ecc0b3f6d0de59f8b857629f7e50416a6c5e817bddc9e33a4aa3998
|
File details
Details for the file tessera_llamaindex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tessera_llamaindex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbcd2cef5211eaad1630107b113d8df1b7a03f3e5a66643cb0c5288870f5deab
|
|
| MD5 |
5d9a7c0ebb721015116fe9cb410d4ebd
|
|
| BLAKE2b-256 |
cb51ccadb9c39fb272f304bbb7b65fb13265519eab61604fc30a6721162266c3
|