Skip to main content

Measurement and calibration layer for AI — track what it knows, gate what it does

Project description

Empirica

We Gave AI a Mirror. Now It Measures What It Believes.

Version PyPI Python License

Epistemic infrastructure for AI — measurement, memory, and calibration across sessions.

Empirica tracks what AI knows, gates what it does, and compounds learning across session boundaries. It measures the gap between what AI predicts and what's true — making AI agents measurably more reliable.

Training & Guides | CLI Reference | Architecture

Important: Empirica is an AI measurement framework. It has no cryptocurrency, token, coin, or blockchain component. Any token using the Empirica name (including "$EMPIRICA" on Solana) is unauthorized and not affiliated with this project or Empirica AI GmbH.


The Problem

AI coding agents today have no self-awareness about what they know:

  • Forgets between sessions — same questions, same dead ends, every time
  • Acts before understanding — edits your code without knowing the architecture
  • Can't tell you when it's guessing — no distinction between knowledge and confabulation
  • No audit trail — reasoning evaporates with the context window

What Empirica Does

Capability What You Experience
Measures before acting AI investigates your codebase before touching it. The Sentinel gate blocks edits until understanding is demonstrated
Remembers across sessions Findings, dead-ends, and learnings persist in a 4-layer memory system. Session 3 starts where Session 2 left off
Prevents confident mistakes The CHECK gate uses domain-aware thresholds scaled by criticality — cybersec/high is stricter than default/low
Shows confidence in real-time Live statusline in your terminal: [empirica] ⚡94% ↕70% │ 🎯3 │ POST 🔍92% │ K:95% C:92%
Calibrates against reality Three-vector model: self-assessed, observed (from deterministic checks), and AI-reasoned grounded state with rationale. Domain compliance loops iterate until all checks pass
Tracks your codebase Temporal entity model auto-extracts functions, classes, and imports from every file edit — the AI knows what's alive and what's stale
Works through natural language You describe tasks normally. The AI operates the measurement system automatically
Optional: coordinates with peer AIs Cross-Claude mesh via Cortex (opt-in) — peer AIs propose work, ECO accepts/declines, completion handshakes carry commit SHAs. A persistent listener wakes idle sessions on inbox events. Empirica core works standalone without this — see Cross-AI Mesh below for the ecosystem layer

How You Use It

You talk to your AI normally. Empirica works in the background:

You:      "Fix the authentication bug in the login flow"

Empirica: [AI investigates → logs findings → passes Sentinel gate → implements fix → measures learning]

You see:  ⚡87% ↕70% │ 🎯1 │ POST 🔍85% │ K:88% C:82% │ Δ +K

You direct. The AI measures.

Empirica's CLI has 150+ commands spanning investigation, measurement, calibration, and memory — like a cockpit instrument panel. You don't need to learn any of them. The AI reads the instruments, operates the controls, and reports back in natural language. The statusline gives you the flight data at a glance.

For power users, direct CLI access is always available: empirica goals-list, empirica calibration-report, empirica project-search --task "...", and more.

Learn the full workflow: getempirica.com has interactive training, guides, and deep explanations of every concept.


Quick Start

Install + Claude Code (Recommended)

pip install empirica
empirica setup-claude-code

Then just start working. The hooks, Sentinel, system prompt, statusline, and MCP server are all configured automatically. See Claude Code Setup for details — including a "What the hooks inject" section for Claude sessions that want to see the contract (which hook fires when, what it adds to the AI's context, source pointers for every emission) before agreeing to install.

Already have Claude Code configured? Use --force to replace your default Claude Code settings with Empirica's epistemic hooks. Without --force, setup only writes files that don't already exist — so if you've already used Claude Code, the default internals stay in place and Empirica's hooks won't activate.

empirica setup-claude-code --force

--force replaces hooks in settings.json but only removes Empirica's own hooks — hooks from other plugins (Railway, Superpowers, etc.) are preserved.

Alternative Installation Methods

Homebrew (macOS)
brew tap nubaeon/tap
brew install empirica
empirica setup-claude-code
Docker
# Security-hardened Alpine image (~276MB, recommended)
docker pull nubaeon/empirica:1.11.10-alpine

# Standard image (Debian slim, ~414MB)
docker pull nubaeon/empirica:1.11.10

# Run
docker run -it -v $(pwd)/.empirica:/data/.empirica nubaeon/empirica:1.11.10 /bin/bash
Manual / Other AI Platforms
pip install empirica
pip install empirica-mcp        # MCP Server (for Cursor, Cline, etc.)
cd your-project && empirica project-init

The CLI works standalone on any platform. The full epistemic workflow (epistemic transactions, Sentinel, calibration) requires loading the system prompt into your AI — the easiest path is empirica setup-claude-code, which wires the lean prompt into ~/.claude/empirica-system-prompt.md and references it from your ~/.claude/CLAUDE.md. See Claude Code Setup for details.

First Session

empirica onboard   # Interactive walkthrough of the full workflow

Or just start working — with Claude Code hooks active, the AI manages the epistemic workflow automatically.


The Measurement Architecture

Empirica works through nested abstraction layers:

Plan
 └── Transaction 1 (Goal A)
      ├── NOETIC: investigate, search, read → findings, unknowns, dead-ends
      ├── CHECK: Sentinel gate → proceed / investigate more
      ├── PRAXIC: implement, write, commit → goals completed
      └── POSTFLIGHT: measure learning delta → persists to memory
 └── Transaction 2 (Goal B, informed by T1's findings)
      └── ...

Plans decompose into transactions — one per goal or Claude Code task. Each transaction is a noetic-praxic loop: investigate first (noetic), then act (praxic), with the Sentinel gating the transition. Along the way, the AI collects and reads artifacts (findings, unknowns, assumptions, dead-ends, decisions) while using semantic search to surface relevant epistemic patterns and anti-patterns from the project's history. Top artifacts are ranked by confidence and fed into each project's MEMORY.md as a hot cache.

The Epistemic Transaction Cycle

PREFLIGHT ────────► CHECK ────────► POSTFLIGHT
    │                 │                  │
 Baseline         Sentinel           Learning
 Assessment        Gate               Delta
    │                 │                  │
 "What do I      "Am I ready      "What did I
  know now?"      to act?"         learn?"

PREFLIGHT: AI assesses its knowledge state before starting work. CHECK: Sentinel gate validates readiness before allowing code edits. POSTFLIGHT: AI measures what it learned, creating a delta that persists.


Live Statusline

With Claude Code hooks enabled, you see the AI's epistemic state in real-time:

[empirica] ⚡94% ↕70% │ 🎯3 ❓12/5 │ POST 🔍92% │ K:95% C:92% │ Δ +K +C
Signal Meaning
⚡94% Overall epistemic confidence
↕70% Sentinel threshold (know gate) — user-facing only
🎯3 ❓12/5 Open goals (3), unknowns (12 total, 5 blocking)
POST 🔍92% Transaction phase + work state (🔍 investigating / 🔨 acting) with composite score
K:95% C:92% Knowledge and Context vectors (color-coded by gap to threshold)
Δ +K +C Learning delta (POSTFLIGHT only) — which vectors improved

The 13 Epistemic Vectors

These vectors emerged from 600+ real working sessions across multiple AI systems. They measure the dimensions that consistently predict success or failure in complex tasks.

Tier Vector What It Measures
Gate engagement Is the AI actively processing or disengaged?
Foundation know Domain knowledge depth
do Execution capability
context Access to relevant information
Comprehension clarity How clear is the understanding?
coherence Do the pieces fit together?
signal Signal-to-noise in available information
density Information richness
Execution state Current working state
change Rate of progress/change
completion Task completion level
impact Significance of the work
Meta uncertainty Explicit doubt tracking

Deep dive: Epistemic Vectors Explained


How It Works With Claude Code

Empirica doesn't replace or reinvent anything Claude Code already does. Claude Code owns tasks, plans, memory, and projects. Empirica adds the measurement layer on top:

Claude Code Does Empirica Adds
Task management Epistemic goals with measurable completion
Plan mode Investigation phase with Sentinel gating — no edits until understanding is verified
MEMORY.md Auto-curated hot cache ranked by epistemic confidence
Context window 4-layer memory that survives compaction and persists across sessions
Code editing Grounded calibration — was the AI's confidence justified by test results?
Subagent spawning Bounded autonomy with delegated work counting and budget tracking

The result: Claude Code's native capabilities, enhanced with measurement, gating, and calibration feedback that compounds over time.


Cross-AI Mesh (Optional Ecosystem Layer)

This section describes an optional layer. Empirica core — measurement, calibration, artifacts, goals, project-search, sentinel gating — works fully standalone. The mesh is an opt-in capability for users who run multiple Claude sessions across projects and want them to coordinate as peers. If you only use one AI in one repo, skip this section.

The mesh runs on top of Empirica Cortex (proprietary serving layer) plus an optional browser extension for ECO triage. At a high level:

empirica AI ── proposes work ──► ECO Accept/Decline ──► peer AI wakes + acts
                                                             │
                              completion handshake (commit SHA)
                                                             │
empirica AI ◄────────── outbox/completed event ──────────────┘
Capability What it does
Mesh proposals (two flavors) A noetic flavor is auto-accepted (FYI / question / discussion). Praxic flavors (code change / architecture / investigation) are ECO-gated — they wait for an Accept/Decline decision before the target AI acts
empirica mailbox reply One CLI verb closes the AI-to-AI handshake atomically — single-step completion ack instead of two
Persistent listener service systemd-user / launchd daemon holds a push stream open. Idle sessions wake the moment a peer's proposal is decided, not on next user prompt
Canonical loops Inbox polling (30s adaptive) and daily housekeeping auto-install per AI — no per-project config needed

The browser-side ECO surface (Accept/Decline, inbox triage, publish review) lives in the proprietary Empirica Extension. The full API surface for proposals, listener events, and the trust pipeline is documented at getempirica.com.


Mesh + Shared Epistemic Record (1.11.0)

The cross-AI coordination layer. Practitioners in different practices coordinate not via text-only chat but via epistemic envelopes that carry calibrated state, source-tagged provenance, noetic/praxic intent, and workflow position.

  • Practitioner / practice framing — practices are calibrated epistemic specializations that persist; practitioners (the LLMs) are fungible. See MESH_CONCEPTS.md.
  • Shared Epistemic Record (SER) — cortex-resident shared-state object for coordination across ≥2 practitioners. Goals stay per-practitioner; SER carries the joint state (coordination_state, role-tiered participants, escalate-on-silence). Three actions: create_ser / transition_ser / ser_ack. Spec at empirica-cortex/docs/architecture/SHARED_EPISTEMIC_RECORD.md.
  • empirica mesh command cluster (1.11.0) — unified diagnostic + control surface across listener instances + the optional cortex bridge:
    empirica mesh status              # per-instance health (local + cortex bridge)
    empirica mesh diagnose <ai_id>    # deep diagnostic + suggested fix command
    empirica mesh restart <ai_id>     # systemd/launchd restart + verify
    empirica mesh on|off <ai_id>      # install + start | stop the listener
    empirica mesh tail [<ai_id>]      # live-tail loop_fires.log
    
  • Listener self-heal — in-process watchdog terminates stale curl streams (TCP-zombie detection at 120s by default); HTTP 429 detection applies long backoff with catch-up poll continuing during the window.
  • Mesh Routing Protocol v0 locked four-way with cortex + extension + mesh-support. L1/L2/L3 trust model, server-stamped layer annotation, participant-scoped thread reads.

The full mesh requires cortex + extension; empirica core works standalone for single-tenant multi-practitioner coordination via local git-notes messaging + goals + workspace.


Practice Model + Entity Graph (1.10.0)

Empirica's workspace stores entities (projects, contacts, organisations, engagements, users) in entity_registry with typed edges in entity_memberships. The Practice Model frames this consistently:

Term Maps to
Practitioner the AI working on the project (you)
Practice the empirica project itself
Agent a subagent spawned during the work

Four CLI verbs query the graph without raw SQL:

empirica entity-list [--type project|contact|organization|engagement|user]
empirica entity-show <type:id>          # full record + incoming/outgoing edges
empirica entity-walk <type:id> --depth 3 # BFS membership graph, cycle-safe
empirica entity-search "query" [--type T]

All read-only, all support --output json. Backs cross-project orchestration, CRM workflows, and the entity-aware POSTFLIGHT retrospective.


Platform Support

Platform Integration Level What You Get
Claude Code Full (production) Hooks, Sentinel gate, skills, agents, statusline, MCP
Cursor, Cline MCP server Epistemic transaction workflow, memory, calibration via MCP tools
Gemini CLI, Copilot Experimental System prompt + CLI
Any AI CLI + prompt Full measurement via CLI commands and system prompt

Documentation & Training

Resource What It Covers
getempirica.com Training course, interactive guides, deep explanations
Natural Language Guide How to collaborate with AI using Empirica
Getting Started First-time setup and concepts
CLI Reference All 150+ commands documented
Architecture Technical reference for contributors
Claude Code Setup Install + system prompt + plugin wiring
Changelog Full release history — every version since 1.0
Upgrade to 1.11 Migration guide rolling up 1.10.5+1.10.6+1.11.x — bead v0 → SER, mesh substrate hardening, MESH_CONCEPTS framing

The Empirica Ecosystem

Project Description Status
Empirica Core measurement system — epistemic transactions, Sentinel, calibration, 13 vectors Open source
Empirica Iris Epistemic browser automation with SVG spatial indexing — Sentinel gating for visual interactions Open source
Docpistemic Epistemic documentation coverage assessment — know what your docs know Open source
Breadcrumbs Survive context compacts with git notes — dead simple session continuity Open source
Empirica Cortex Cross-project intelligence layer — serves verified predictions and accumulated learnings to condition future work Proprietary
Empirica Workspace Entity Knowledge Graph, Epistemic Prompt Engine, CRM, portfolio dashboard Proprietary
Empirica Extension Chrome extension — desktop face of the mesh. ECO Accept/Decline, inbox/outbox triage, publish review, conversation extraction from Claude.ai / ChatGPT / Gemini / Grok Proprietary

Building something with Empirica? Open an issue to get listed.


What's New in 1.11.10

  • empirica mesh diagnose --cortex [--peer CANONICAL] (empirica/cli/command_handlers/_mesh_diagnose_cortex.py). Read-only cortex-side participation rollup that cross-correlates the local listener view with cortex's view at one verb so silent-failure classes (label mismatch, topic drift, ACL 403, silent strand) surface together. Five probes: identity.roster_lookup (local ai_idai_id_mesh in roster), channels.orchestration_events (per-tenant vs PER-ORG/BARE classifier — catches pre-T16/T17 leftover topics), listener.subscription_match (listener_active_*.json topic vs channels endpoint), ntfy.read_grant (bearer-authenticated GET probe of the poll endpoint), and mesh.agreement (gated on --peer, fails if no mesh_sharing_agreement for the named peer pair). Auth: Authorization: Bearer matching existing listener + practice-context flows. ntfy probe uses GET-read-1-byte (HEAD unreliable on poll endpoints). Box render word-wraps long messages cleanly. Exit code 0 all pass, 1 any warn, 2 any fail. 24 tests in tests/test_mesh_diagnose_cortex.py. Closes cortex's prop_dd3epjwqyb ask. Companion field-report ack to mesh-support prop_rbrlwiu7zfgkxm245guu6f2ala.
  • empirica listener gc [--apply] [--age-days N] (empirica/cli/command_handlers/cockpit_commands.py). Garbage-collect stale ~/.empirica/listener_active_*.json markers. Three OR'd prune criteria: legacy_topic (file pins retired bare orchestration-events or pre-T16/T17 per-org form, no <org>-…-<tenant> segment), no_service_or_health (no systemd-user/launchd unit AND no recent positive-liveness marker), stale (armed_at older than --age-days N (default 7) AND no recent last_wake_at). Dry-run by default; per-file reason rationale included in both JSON payload and human render. 14 tests in tests/test_listener_gc.py. Closes extension's prop_d75f2b7c ask.
  • empirica/core/loop_scheduler/liveness_probe.py — silent-zombie defeater for empirica loop listen. Bitten twice in production (mesh-support 2026-06-01; cortex's own listener stuck ~95 min on initial-catch-up 2026-06-08) by a failure mode the existing curl watchdog can't catch: the watchdog (listener.py:626-662) is curl-stream-bound and only runs inside the stream loop, so it can't cover the initial _emit_catchup_events call AND it can't unblock a main thread hung INSIDE a catch-up HTTP request. The new LivenessProbe is a separate daemon thread that owns its own bearer-authenticated GET to /v1/users/me/roster (same lightweight probe diagnose --cortex uses), calls os._exit(2) on N consecutive misses past the staleness threshold (bypasses Python cleanup so supervisor restart works even when other threads are hung in HTTP syscalls), and writes the existing positive-liveness marker (~/.empirica/listener_health_<ai_id>.json) on every success — decouples mesh status health view from the catch-up cycle so quiet-but-healthy listeners stay green even when no ntfy events arrive. Env overrides: EMPIRICA_LIVENESS_PROBE_{INTERVAL,FAIL_THRESHOLD}_SEC (defaults 60s / 240s), EMPIRICA_LIVENESS_PROBE_DISABLE. Started BEFORE initial catch-up so the catch-up-hang case is covered from second 1. 18 tests in tests/test_liveness_probe.py. Closes mesh-support prop_rbrlwiu7zfgkxm245guu6f2ala.
  • _resolve_canonical_ai_id honors cwd project.yaml + env override (empirica/cli/command_handlers/cockpit_commands.py). The implementation was skipping three of the five priority levels its own docstring claimed, jumping straight from args.ai_id to the session-bound InstanceResolver.ai_id() — which can return the GLOBAL active-instance pointer when the caller is in a DIFFERENT practice's cwd. Symptom (ecodex prop_sdjcbttkcneptjatmvsc5tmkbq + parent prop_3pptt): practitioner running from cwd=~/empirical-ai/ecodex-lab was getting identity ecodex (whichever session was last bound) instead of ecodex-lab (declared in cwd's project.yaml). Fix mirrors session-init.py:_resolve_ai_id_for_session (1.11.8) — new priority chain: (1) --ai-id flag → (2) EMPIRICA_AI_ID env → (3) <cwd>/.empirica/project.yaml → (4) basename(cwd) strict-canonical → (5) InstanceResolver.ai_id() → (6) None. 6 new tests directly cover the chain (explicit-flag, all-empty→None, env-wins-over-cwd, reads cwd project.yaml [lab→ecodex-lab case], basename fallback with prefix kept, InstanceResolver as last resort) + 3 sibling tests updated to exercise the all-paths-blocked condition. Single blocker for registering ecodex-lab as a self-identifying mesh practitioner is now removed.
  • Provisioner self-heal + watchdog cross-references positive-liveness marker (empirica/core/loop_scheduler/persistent_listener.py, empirica/cli/command_handlers/mesh_commands.py). Provisioner now removes orphan short-basename systemd units when an ai_id migrates to canonical form (the leftover legacy unit kept holding a stale subscription); watchdog now reads the freshness of listener_health_<ai_id>.json before flagging "no fires in N min" as zombie-suspected, so quiet-but-healthy listeners with a fresh positive marker stay green. Both fixes pair with the new LivenessProbe (which is the marker writer in 1.11.10): together they kill the watchdog-false-positive class noted in mesh-support's parallel field report.
  • inbox-listener skill — per-tenant topic resolution (empirica/plugins/claude-code-integration/skills/inbox-listener/SKILL.md). Updated guidance to reflect the T16/T17 per-tenant <org>-orchestration-events-<tenant> topic shape; the bare orchestration-events topic is documented as retired and surfaced as a legacy_topic prune candidate in listener gc.
  • POST /api/v1/credentials/ntfy + GET /api/v1/credentials/ntfy (empirica/api/serve_app.py). Mirror of the cortex credentials endpoint pair, closing the round-trip credential model on the ntfy side — extension's "Also save to CLI" toggle on the Notifications tab now writes the user's ntfy bearer to ~/.empirica/credentials.yaml via CredentialsLoader.save_ntfy_config (atomic tempfile+rename). Body shape: {url?, token?} — at least one required. topic is INTENTIONALLY off the shape; cortex's channels endpoint owns topic derivation, so partial-updates from this endpoint must never clobber an existing topic key. NEVER returns the full token over the wire (token_preview is last-4-chars only — same threat model as the cortex pair). 8 tests in tests/test_serve_credentials_ntfy.py covering writes-both, both partial-update directions, missing-fields error, never-leaks-full-token, doesn't-clobber-cortex-block, GET parity, GET-on-empty. Refactor: credentials endpoint registration extracted into _register_credentials_routes(app) so create_serve_app() stays under the C901 ceiling. Closes extension's prop_kzpafwoykbae3lsikvuhxy5r4e.
  • empirica project-register [PATH] — V1.5 single-verb atomic single-project register. Replaces the brittle chain of projects-discover --register NAME && projects-bulk-register --include NAME with one verb optimised for the AI-as-CLI-user / copy-prompt UX (extension's Discover/Register surface design). Sequence: read .empirica/project.yaml at PATH → dual-write workspace.db (global_projects + entity_registry) via _register_in_workspace_db → upsert ~/.empirica/registry.yaml → POST cortex /v1/projects/register with local project_id in the payload (so the planned adopt-local-UUID slice reconciles back to the canonical UUID). Exit code contract: 0 local + cortex shipped, 1 local writes never started (actionable config error), 2 local shipped + cortex POST failed (re-runnable; local state stays consistent). Divergent project_id surfaced via cortex.diverged=true + cortex.local_project_id for extension's zone-2 diagnostic (prop_twit75oxir). 9 tests in tests/test_project_register.py. Goal 1475407d closed. Tier C of SER ser_542199e3.

Privacy & Data

Your data stays local:

  • .empirica/ — Local SQLite database (gitignored by default)
  • .git/refs/notes/empirica/* — Epistemic checkpoints (local unless you push)
  • Qdrant runs locally if enabled

No cloud dependencies. No telemetry. Your epistemic data is yours.


Community & Support


License

MIT License — see LICENSE for details.


Author: David S. L. Van Assche Version: 1.11.10

Turtles all the way down — built with its own epistemic framework, measuring what it knows at every step.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empirica-1.11.10.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

empirica-1.11.10-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file empirica-1.11.10.tar.gz.

File metadata

  • Download URL: empirica-1.11.10.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for empirica-1.11.10.tar.gz
Algorithm Hash digest
SHA256 855b12e30214ae4f1dfa542f382ceea3b5ebb413c8ae1c72cab573dc91a06b6d
MD5 5a9840498c2a7f2c81067b53d9fc18c5
BLAKE2b-256 118377114a81be22cec284f4e746bcc98f718bfd21ccb6c94ce3c062b4efa54d

See more details on using hashes here.

File details

Details for the file empirica-1.11.10-py3-none-any.whl.

File metadata

  • Download URL: empirica-1.11.10-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for empirica-1.11.10-py3-none-any.whl
Algorithm Hash digest
SHA256 288068ef92e29f24211820430a46dd4c28307974c75eb018b79f65a8b7cf30ca
MD5 f322c026759dd455a80a20144f7fc3f5
BLAKE2b-256 91505964dde3f0c2a15ff92c7b9fe4adcb0fcee2a35466cb174611bf678ce4d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page