LLM Transport & Efficiency Layer — make LLM calls cheaper, faster, and smarter

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Harshdaga

These details have not been verified by PyPI

Project description

LATTICE

LLM Transport & Efficiency Layer
Make every LLM call cheaper, faster, and safe — without changing your model.

LATTICE is an intelligent transport proxy that sits between your application and any LLM provider. It applies network-layer optimizations — congestion control, binary framing, delta encoding, speculation, batching — plus a safety-gated compression pipeline with 18 transforms. Your app sends standard OpenAI API requests; LATTICE makes them smaller, faster, safer, and cache-friendly.

It is not a router. LATTICE never changes your model, never falls back between providers, never guesses. You route to exactly one provider per request. LATTICE optimizes the transport and execution.

Installation
Quick Start
Architecture
Novel Technology
Compression Pipeline
Safety
Observability
Supported Providers
CLI Reference
Agent Integration
Development
Documentation
License

Installation

pip install lattice-transport

Optional dependencies:

pip install "lattice-transport[redis]"   # Multi-process session store
pip install "lattice-transport[mcp]"    # MCP tool support
pip install "lattice-transport[all]"     # Everything

Requirements: Python 3.10+. No external services needed for single-process mode.

Quick Start

# Start the proxy
lattice proxy run --port 8787

# Point any OpenAI SDK at it
export OPENAI_BASE_URL=http://localhost:8787/v1

# Or route an agent through it
lattice lace claude

# Or use the SDK
from lattice import LatticeClient

client = LatticeClient()
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Explain transport protocols"}],
)
print(response.choices[0].message.content)

Every request is automatically compressed, cached, and optimized. Zero code changes in proxy mode.

Architecture

                          ┌──────────────────────────┐
                          │   Application / Agent     │
                          │  (Claude, Cursor, Codex,  │
                          │   OpenAI SDK, curl)       │
                          └────────────┬─────────────┘
                                       │ OpenAI API format
                          ┌────────────▼─────────────┐
                          │   LATTICE PROXY :8787     │
                          │                           │
         ┌────────────────┼───────────────────────┐   │
         │                │                       │   │
         ▼                ▼                       ▼   │
┌─────────────┐  ┌─────────────────┐  ┌─────────────────┐
│   Session   │  │    Transform    │  │    Semantic     │
│   Manager   │  │    Pipeline     │  │     Cache       │
│             │  │                 │  │                 │
│ Memory or   │  │ 18 transforms   │  │ Exact-hash      │
│ Redis store │  │ priority-ordered│  │ + approximate   │
│             │  │ risk-gated       │  │ semantic match  │
│ CAS version │  │ expansion-capped│  │ LRU + TTL       │
└──────┬──────┘  └────────┬────────┘  └────────┬────────┘
       │                  │                    │
       └──────────────────┼────────────────────┘
                          │
              ┌───────────▼──────────────────┐
              │      DirectHTTPProvider      │
              ├──────────────────────────────┤
              │  ProviderRegistry (17 adapt)  │
              │  ConnectionPool (HTTP/2)      │
              │  StreamStallDetector          │
              │  TACC Congestion Controller   │
              └───────────┬──────────────────┘
                          │
                          ▼
              ┌───────────────────────┐
              │     LLM Provider      │
              │  (exactly one per req)│
              └───────────────────────┘

Request Flow

1. Client sends OpenAI-compatible POST /v1/chat/completions
2. SessionManager creates or retrieves session (with CAS versioning)
3. 18 transforms run in priority order, each gated by:
   config → policy → runtime budget → risk gate → expansion guard
4. SemanticCache checks exact hash, then approximate fingerprint
5. [cache miss] Provider adapter serializes → HTTP/2 pool → provider
6. [streaming] StallDetector monitors per-provider tolerance windows
7. TACC controller manages concurrency window (token-based, not request-count)
8. Response deserialized → pipeline reverse pass → OpenAI JSON → client
9. Session updated, response cached, headers attached

Novel Technology

LATTICE adapts classical systems techniques for LLM workloads. These are not LLM features — they are transport, network, and execution innovations.

TACC

Token-Aware Congestion Control — AIMD-style adaptive concurrency. Manages per-provider admission using token pressure (not request counts: a 100K-token request uses more provider capacity than a 10-token one). Priority-ordered waiting queue, stall-aware window collapse, cache-aware latency smoothing. → Deep Dive

Binary Framing

15-byte fixed header format. 17 semantic frame types (PING, REQUEST, STREAM_CHUNK, RESUME_TOKEN...). CRC32 per-frame integrity. Semantic boundary flags for sentence/tool/reasoning boundaries. O(1) parsing — no JSON overhead per chunk. → Deep Dive

Delta Encoding

After turn 1, sends only new messages — server reconstructs full context from session store. CAS-style optimistic concurrency via anchor versioning prevents lost updates. Graceful fallback on version/sequence mismatch. → Deep Dive

Stream Architecture

Per-provider dynamic stall detection with phase-aware tolerance multipliers (first_chunk=1.5×, streaming=1.0×, thinking=2.0×, tool_call=1.2×). Token velocity tracking catches trickle-stalls. Multi-stream multiplex (QUIC-inspired) with independent lifecycle per stream. HMAC-signed resume tokens with circular replay windows. → Deep Dive

Request Batching

Groups independent requests sharing model/temperature/tools into single provider calls. 30-60% per-request overhead reduction from shared prompts. Streaming requests excluded. Compatibility-keyed grouping. → Deep Dive

Speculative Execution

Sidecar prediction of next-turn content. Rule-based (zero-cost). Runs in parallel with real request — discard if wrong, instant if right. Never blocks the main request. Confidence threshold ≥0.7. → Deep Dive

Compression Pipeline

18 transforms in priority order. Every transform is safety-classified and risk-gated.

P	Transform	Safety	What It Does
1	content_profiler	SAFE	Classifies content type, computes 0-100 risk score
2	runtime_contract	SAFE	Enforces transform time budget per-complexity tier
9	cache_arbitrage	SAFE	Reorders for KV-cache alignment, sets provider hints
10	prefix_optimizer	SAFE	Deduplicates common message prefixes
15	message_dedup	CONDITIONAL	Removes exact/near-duplicate turns
20	reference_sub	CONDITIONAL	UUIDs, URLs, paths → `<ref_N>` short references
22	rate_distortion	CONDITIONAL	Extractive text compression of long-form content
24	grammar_compress	CONDITIONAL	Grammar-based structured data compression
25	dictionary_compress	CONDITIONAL	Learned phrase dictionary (HPACK-style)
25	format_conversion	CONDITIONAL	Markdown tables, JSON → compact CSV/TSV
30	tool_filter	SAFE	Strips internal fields from tool output
40	output_cleanup	SAFE	Normalizes whitespace, trims boilerplate

Execution transforms (outside main pipeline): batching, speculative execution, delta encoding, auto-continuation.

→ Full Transform Reference

Safety

Every transform is classified into one of three buckets. A 0-100 semantic risk score (8 dimensions) gates CONDITIONAL and DANGEROUS transforms.

Risk Score     SAFE       CONDITIONAL    DANGEROUS     Expansion Guard
─────────      ────       ───────────    ──────────    ───────────────
LOW (0-20)     ✓          ✓              ✓             tokens × 1.5 max
MEDIUM (20-40)  ✓          ✓              ✗             tokens × 1.5 max
HIGH (40-60)   ✓          ✗              ✗             tokens × 1.5 max
CRITICAL (>60) ✓          ✗              ✗             tokens × 1.5 max

→ Safety Deep Dive

Observability

Every request returns routing metadata. Full runtime state in /stats.

curl http://localhost:8787/stats | jq

Key surfaces:

/stats — Full JSON: transforms, sessions, pools, TACC state, maintenance, downgrades, ignored chunks
/metrics — Prometheus format: counters, gauges, latency histograms per provider
Response headers — x-lattice-compression, x-lattice-session-id, x-lattice-delta, x-lattice-cost-usd
Maintenance — Background cleanup every 60s (stale streams, cache expiry), visible in /stats/maintenance

→ Observability Guide

Supported Providers

17 direct adapters. No routing — one provider per request.

Provider	Prefix	HTTP/2	Cache	Streaming
OpenAI	`openai/`	✅	AUTO_PREFIX	SSE delta
Anthropic	`anthropic/`, `claude-`	✅	EXPLICIT_BREAKPOINT	SSE
Groq	`groq/`	✅	—	SSE
DeepSeek	`deepseek/`	✅	—	SSE
Mistral	`mistral/`	✅	—	SSE
Cohere	`cohere/`	✅	—	SSE
Gemini	`gemini/`, `google/`	✅	EXPLICIT_CONTEXT	SSE
Vertex AI	`vertex/`	✅	EXPLICIT_CONTEXT	SSE
Azure	`azure/`	✅	AUTO_PREFIX	SSE
Bedrock	`bedrock/`	✅	EXPLICIT_BREAKPOINT	SSE
Ollama	`ollama/`	—	—	SSE
Ollama Cloud	`ollama-cloud/`	✅	—	SSE
OpenRouter	`openrouter/`	✅	—	SSE
Fireworks	`fireworks/`	✅	—	SSE
Together	`together/`	✅	—	SSE
Perplexity	`perplexity/`	✅	—	SSE
AI21	`ai21/`	✅	—	SSE

→ Provider Details

CLI Reference

lattice proxy run --port 8787          # Start foreground
lattice proxy start --port 8787        # Start daemon
lattice proxy stop                     # Graceful shutdown
lattice proxy status                   # PID, uptime, health

lattice init                           # Auto-detect + configure agents
lattice lace claude                    # Route agent through proxy
lattice unlace claude                  # Restore original config

lattice info                           # Version, transforms, config
lattice status                         # Proxy + agent health
lattice health                         # Connectivity check
lattice doctor                         # Diagnose routing issues
lattice config                         # Resolved configuration

→ Full CLI Reference

Agent Integration

Route coding agents through LATTICE with a single command:

lattice lace claude       # Claude Code
lattice lace codex        # OpenAI Codex  
lattice lace cursor       # Cursor
lattice lace opencode     # OpenCode
lattice lace copilot      # GitHub Copilot

lattice lace starts the proxy, configures the agent's environment, launches the agent, and cleans up on exit. No permanent changes.

For permanent configuration: lattice init patches agent config files. lattice unlace reverses.

→ Integration Guide

Development

git clone https://github.com/Harsh-Daga/lattice
cd lattice
uv sync          # Install all deps
uv run pytest    # 1584 tests, 7 skipped

# Lint + typecheck
uv run ruff check src/
uv run mypy src/lattice/

# Run benchmarks
uv run python benchmarks/evals/cli.py --suite all \
  --providers ollama-cloud \
  --provider-model ollama-cloud=kimi-k2.6:cloud \
  --iterations 1 --warmup 0 --provider-warmup 0

Documentation

Section	Documents
Getting Started	Quick Start · Installation · CLI
Concepts	Architecture · Proxy · SDK · Observability · Safety
Novel Tech	TACC · Binary Framing · Delta Encoding · Streaming · Batching & Speculation
Compression	Transforms · Caching · Protocol
Providers	17 Providers
Evaluation	Benchmarks
Operations	Agent Integrations

→ Full Documentation Index

License

MIT © Harsh Daga

GitHub · Issues · PyPI · Changelog · Contributing

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Harshdaga

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lattice_transport-0.1.0.tar.gz (550.4 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lattice_transport-0.1.0-py3-none-any.whl (379.5 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file lattice_transport-0.1.0.tar.gz.

File metadata

Download URL: lattice_transport-0.1.0.tar.gz
Upload date: May 1, 2026
Size: 550.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lattice_transport-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cfae50890ebac4ec5616a00eae1c3dde5301d4360297f91998b07cd583fba9ef`
MD5	`a70dff6aa859d80cf3574225339c6a4f`
BLAKE2b-256	`cf1c2fcb747f83da5df43489f0f9d5482e86f34a1ab3850ca0cdafa6e42c6065`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lattice_transport-0.1.0.tar.gz:

Publisher: publish.yml on Harsh-Daga/Lattice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lattice_transport-0.1.0.tar.gz
- Subject digest: cfae50890ebac4ec5616a00eae1c3dde5301d4360297f91998b07cd583fba9ef
- Sigstore transparency entry: 1422764088
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: Harsh-Daga/Lattice@dccad1fe278492fdb82e065bd67f4cbb481f5864
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/Harsh-Daga
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dccad1fe278492fdb82e065bd67f4cbb481f5864
- Trigger Event: release

File details

Details for the file lattice_transport-0.1.0-py3-none-any.whl.

File metadata

Download URL: lattice_transport-0.1.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 379.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lattice_transport-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc4105985b33d796c45093c6a8d3f8fce58926105a911d4c21b9306440ddf41d`
MD5	`ca3bda0f1c1bf74a2aef45ee75ae0cf7`
BLAKE2b-256	`139a495effcfdb62c354bed90434c32dfc45b9adc2f387383ff73c635aebe9bd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lattice_transport-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Harsh-Daga/Lattice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lattice_transport-0.1.0-py3-none-any.whl
- Subject digest: dc4105985b33d796c45093c6a8d3f8fce58926105a911d4c21b9306440ddf41d
- Sigstore transparency entry: 1422764196
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: Harsh-Daga/Lattice@dccad1fe278492fdb82e065bd67f4cbb481f5864
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/Harsh-Daga
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dccad1fe278492fdb82e065bd67f4cbb481f5864
- Trigger Event: release

lattice-transport 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LATTICE

Table of Contents

Installation

Quick Start

Architecture

Request Flow

Novel Technology

TACC

Binary Framing

Delta Encoding

Stream Architecture

Request Batching

Speculative Execution

Compression Pipeline

Safety

Observability

Supported Providers

CLI Reference

Agent Integration

Development

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance