Skip to main content

The harness for your AI agents. One line to install. Captures every model call, maps your codebase, generates frame-aware PR fixes — without your source ever leaving your machine.

Project description

tessen

The harness for your AI agents. One line to install. Captures every model call, every tool use, every retry. Maps your codebase. Learns your team's commit style. Generates frame-aware PR fixes — without your source ever leaving your machine.

Tame the beast in your Agentic Workflows. Use Tessen.

import tessen
tessen.init()

That's the entire integration. Drop it before you construct your LLM client.

Why

It's 3am. You're paged. Your agent burned $2,000 in API spend in two hours, looping on the same broken tool 47 times because the SDK swallowed a 502 and the agent silently retried. Your dashboard says the request succeeded. Your traces show one span. Your logs are noise.

You cannot defend what you cannot see — and the tools that watch your agent treat each model call like a web request instead of like a program that thinks. Tessen treats it like a program. Every thinking block, every tool call, every cache decision, every retry — recorded structurally, in your process, on your disk. When the page comes in, you have the receipts.

Install

pip install tessen
# or, with the optional log viewer
pip install "tessen[viewer]"

Core install has zero hard dependencies. Tessen patches the vendor SDKs that are already importable in your process — Anthropic, OpenAI, Google Gemini, Cohere, Mistral — without forcing any of them on you.

Overhead

Well under 1 ms per captured call — measured. Tessen sits between your code and the vendor's network round-trip (which is hundreds of milliseconds to several seconds), so the overhead is invisible. Run tessen bench to verify on your machine. The regression test fails the build if mean per-event overhead crosses 1 ms.

Typical numbers (Apple Silicon, APFS): ~75 µs mean, ~105 µs p99. The dominant cost is fsync() on disk write — overridden via TESSEN_DISABLE_FSYNC=1 if you want lower latency at the cost of crash durability.

What runs locally, free, today

Capture (always on)

import tessen
tessen.init()

Every model call from your agent — full request, full response with thinking blocks decoded, cache-token usage, call-site, timing — written as JSONL to ~/.tessen/logs/<agent>/<date>.jsonl. Trunk-patched across all 5 vendors: structurally vendor-rename-resistant.

See what just happened

tessen status            # which agents are active, errors, anomalies — at a glance
tessen tail my_agent     # `tail -f` for live events
tessen viewer my_agent   # per-event forensic detail

Per-event inspector. 🚨 markers on anomalies (silent retries, truncated responses, refused requests, cache-misses, malformed tool calls). HTTP status codes, vendor error types, request/response content blocks. The forensic record.

Aggregate findings

tessen analyze my_agent

Rolls up your captured events into a leadership-readable report:

## findings
  🚨 12 ERROR event(s)
     • 8× RateLimitError (429)
     • 4× NotFoundError (404)
  ⚠ 47 event(s) flagged with anomalies:
     • 34× cache_control_set_but_no_cache_activity
     • 12× truncated_response
     • 1× thinking_only_no_output

## cost
  total: $312.40 last 7 days
  claude-sonnet-4-6:  $280.10
  claude-haiku-4-5:   $32.30

## latency (ms): p50=842  p95=4012  p99=14290

Map your codebase

tessen crawl /path/to/your/repo

Deterministic AST scan. Finds every call site to every supported vendor SDK, classifies the surrounding function via named structural rules (tool_use_loop_anthropic, framework_wrapped:langgraph, etc.), extracts per-function docstrings, comments, body hashes, and in-file callers. Stdlib-only. Sub-second on most repos.

What activates with TESSEN_API_KEY

When you set TESSEN_API_KEY in your environment, tessen.init() auto-activates:

export TESSEN_API_KEY=tk_live_...
tessen.init()   # same one line — but now ships to the hosted backend

What ships:

  • Events — your captured runtime stream
  • Manifest — your codebase map (the output of tessen.crawl)
  • Git style fingerprint — your team's commit/test/style conventions, author emails SHA-256 hashed, no commit messages or diffs leave your machine

What never ships:

  • Your source code
  • Your commit messages
  • Your diffs
  • Your author names or emails

The hosted backend uses these signals to generate frame-aware PRs that fix detected agent fragility — matching your team's commit style, test discipline, library preferences, and file co-change patterns. PR specs come back via tessen.apply, which reads your local source and produces the actual diff:

tessen apply spec.json --open-pr

Your source never leaves your machine. The hosted backend works from the manifest + events + style fingerprint; the local applier produces the diff. That's the unfakeable privacy claim.

What gets captured per event

Every event carries this base shape — verified by regression test against the actual captured JSONL:

{
  "event_id": "...",
  "session_id": "...",
  "agent_name": "...",
  "ts": 1715688000.594,                   // unix timestamp (seconds, float)
  "_tessen_tier": 1,                      // 1=trunk, 2=discovery, 3=http-floor
  "provider": "anthropic",
  "surface": "messages.create",
  "call_type": "request",
  "http_method": "POST",
  "http_url": "https://api.anthropic.com/v1/messages",
  "call_site": {"file": "agent.py", "line": 101, "func": "run"},
  "duration_ms": 842.1,
  "request_body": { /* full SDK-serialized JSON */ },
  "streamed": false,
  "status": "ok",                         // "ok" | "error"
  "quality_signals": ["rate_limited"]     // anomaly tags surfaced at write time
}

The response shape depends on what happened:

event variant response fields present
Successful non-streaming call casted_response (pydantic dump with parsed content blocks), response_body (raw JSON), response_headers (request-id, rate-limit, x-should-retry)
Successful streaming call response_body (reassembled from chunks), chunks_captured (count)
Error (HTTP or client-side) error (object with type, msg, tb, status_code, body, response_headers)

So a customer code path that always wants the structured response should fall back: event.get("casted_response") or event.get("response_body") or event.get("error", {}).get("body"). The tessen viewer and tessen analyze CLIs do this for you.

The trunk-patching architecture means:

  • Anthropic SDK rename? Captured anyway.
  • OpenAI changes their resource paths? Captured anyway.
  • Vendor adds a new method we haven't patched? Captured anyway.

Only a base-client rename breaks Tier 1 — at which point Tier 3 (HTTP-layer floor) keeps capturing every model call regardless. Three-layer architecture, structurally hardened.

Anomaly tags

Tessen tags events at write-time with quality_signals so the analyzer can lead with what's actionable. Filter on any tag via tessen.events.filter_events(events, quality_signal="...") or tessen analyze's findings section.

Content-level (Anthropic / OpenAI / Gemini)

  • empty_content_on_end_turn — model finished with stop_reason: end_turn but returned no content. Common silent-failure mode.
  • empty_content_on_stop — OpenAI / Mistral equivalent (finish_reason: stop with empty message.content).
  • truncated_responsestop_reason: max_tokens / length. The customer's max_tokens was hit; agent saw a cut-off response.
  • thinking_only_no_output — Anthropic extended-thinking event with thinking blocks but zero text/tool_use output.
  • refusal — model returned a refusal block (Anthropic / OpenAI policy decline).
  • invalid_tool_use_json — Anthropic tool_use.input was malformed JSON in the streaming reassembly.
  • invalid_tool_call_json — OpenAI tool_calls[].function.arguments was malformed JSON.

Cache-control (Anthropic prompt caching)

  • cache_control_set_but_no_cache_activity — request set cache_control: ephemeral but response showed zero cache_read_input_tokens / cache_creation_input_tokens. Prompt likely below the 1024-token minimum, OR cache TTL expired.

HTTP-error categorization (every vendor)

  • rate_limited (429), overloaded (529), auth_error (401), permission_denied (403), not_found (404), conflict (409), unprocessable (422), invalid_request (400), connection_error (network failure pre-status), server_error (5xx other), client_error (4xx other).

19 tags total, evaluated in order: HTTP errors first (since they're most product-relevant), then content-level anomalies, then cache. Empty quality_signals array means: clean event.

Configuration

Tessen reads everything from env vars. The Python API stays one line.

variable default purpose
TESSEN_API_KEY unset → local-only Activate hosted streaming when set
TESSEN_AGENT_NAME inferred from module Override agent identity
TESSEN_LOG_DIR ~/.tessen/logs Where local JSONL lands
TESSEN_INGEST_URL https://api.tessen.dev/v1/ingest Override for self-hosted
TESSEN_DISABLE_CRAWL unset =1 opts out of auto-crawl
TESSEN_DISABLE_GIT_LEARN unset =1 opts out of auto git fingerprint
TESSEN_LEAF_FALLBACK unset =1 reverts to legacy leaf-patching (compat escape hatch)
TESSEN_DISABLE_FSYNC unset → fsync after every write =1 skips fsync() per event — lower latency, but loses up to one buffered batch on power loss

Shipper tuning (only matters when TESSEN_API_KEY is set):

variable default purpose
TESSEN_FLUSH_INTERVAL_SEC 5.0 Max seconds to wait before flushing a partial batch
TESSEN_BATCH_MAX_EVENTS 64 Max events per POST
TESSEN_QUEUE_MAX 1024 Bounded queue size; drop-oldest on overflow
TESSEN_RETRY_MAX 3 Retries per batch with exponential backoff
TESSEN_HTTP_TIMEOUT_SEC 10.0 Per-request HTTP timeout

Robustness: every TESSEN_* env var is whitespace-tolerant. Boolean flags accept 1 / true / yes / on case-insensitively; numeric tuning vars fall back to defaults for unparseable input. A typo or accidental $UNSET_VAR expansion never crashes import tessen.

Power-user overrides exist as kwargs on tessen.init(). The 99% case is one function call, no args.

Vendor coverage

vendor sync async streaming (ctx-mgr) streaming (iterator) tool-use
Anthropic
OpenAI (chat.completions)
OpenAI (responses API)
Google Gemini (google.genai)
Cohere
Mistral

Framework wrappers (LangChain ChatAnthropic, LangGraph nodes, deepagents, OpenAI Agents SDK) all flow through trunk-patching automatically — no per-framework code in tessen.

CLIs at a glance

After pip install tessen, the tessen command is on your PATH. Every subcommand also works as python -m tessen.<name> if you prefer.

tessen status            # glanceable overview of every captured agent
tessen tail <agent>      # `tail -f` for live events (`-n 5` for backfill)
tessen viewer <agent>    # per-event forensic detail with anomaly markers
tessen analyze <agent>   # aggregate findings — errors, anomalies, cost, latency
tessen compare a b       # side-by-side diff of two events or sessions
tessen upload <agent>    # manually ship local JSONL to the hosted backend
tessen crawl <repo>      # deterministic AST codebase map
tessen repo-learn <repo> # privacy-preserving git style fingerprint
tessen apply <spec.json> # apply a PR spec from the hosted backend
tessen doctor            # diagnose your integration (env vars, vendor SDKs)
tessen doctor --ping     # also probe TESSEN_INGEST_URL reachability
tessen bench             # measure per-event capture overhead
tessen version           # print version

tessen status, tessen tail, tessen viewer, and tessen analyze all accept a bare agent name and resolve it under $TESSEN_LOG_DIR (default ~/.tessen/logs).

Programmatic Python API

For when the CLI isn't the right surface — building your own dashboard, integrating into a notebook, or running tessen's analysis as a step in your own pipeline:

import tessen.events as ev

# Iterate every captured event for an agent
for event in ev.read("my_agent"):
    print(event["event_id"], event["status"], event.get("quality_signals"))

# Get the last N events (handy in notebooks)
last_50 = ev.recent("my_agent", n=50)

# Filter — composable, all kwargs optional
errors_only = ev.filter_events(last_50, status="error")
rate_limited = ev.filter_events(last_50, quality_signal="rate_limited")
in_session = ev.filter_events(last_50, session_id="sess_abc")

# Aggregate — same rollup shape as `tessen analyze --json`. Carries the
# `schema_version` / `tessen_version` / `generated_at` envelope so
# scripts can branch on version changes and cache by timestamp.
report = ev.aggregate(last_50)
print(report["total_cost_usd"], report["error_events"], report["anomaly_counts"])
print(report["schema_version"])  # 1 for tessen 0.3.x

# Time-bounded reads — accepts unix float, tz-aware datetime, timedelta,
# or a duration string like "24h"/"7d"/"2w" (same shorthand as `tessen
# tail --since`).
from datetime import timedelta
recent_24h = list(ev.read("my_agent", since=timedelta(hours=24)))
recent_24h = list(ev.read("my_agent", since="24h"))  # equivalent

# Diagnostic: opt-in stats dict captures corrupt-line counts so you know
# if a writer crash left torn JSONL.
stats: dict = {}
events = list(ev.read("my_agent", stats=stats))
if stats.get("corrupt_lines"):
    print(f"⚠ {stats['corrupt_lines']} torn lines skipped")

# Discover agents (defaults to only ones with captures; include_empty=True
# for raw filesystem truth)
agents = ev.list_agents()

All these functions also accept log_dir= to override the default location.

License

MIT. The SDK is open-source. The hosted backend (analyzer + PR generation) is a separate commercial service — see tessen.dev.

Contact

hi@tessen.dev · tessen.dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tessen-0.3.0.tar.gz (200.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tessen-0.3.0-py3-none-any.whl (123.5 kB view details)

Uploaded Python 3

File details

Details for the file tessen-0.3.0.tar.gz.

File metadata

  • Download URL: tessen-0.3.0.tar.gz
  • Upload date:
  • Size: 200.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for tessen-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ef6a700aa86bf039bcda189c9d79de0645f99f75a535583a3e4d62f5a4ba2952
MD5 41ac3aa423a29f365ec93a6f316b0ca5
BLAKE2b-256 ad3ae348210d38eaff87b6782a45c2d3bf8ff333c2e69ac714e6b0e7d832cdb7

See more details on using hashes here.

File details

Details for the file tessen-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: tessen-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 123.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for tessen-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6bf7230879e9be5e5d45c4ba7b075ffc501f584a15cd79230228f1f81f149fa
MD5 2431ae261e78b2199c550ed931d8adc7
BLAKE2b-256 2331101d1179f4d708c6f8a41aeb94c99d202a3f40243bcf409b0fc228b953f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page