Skip to main content

Measurement and calibration layer for AI — track what it knows, gate what it does

Project description

Empirica

We Gave AI a Mirror. Now It Measures What It Believes.

Version PyPI Python License

Epistemic infrastructure for AI — measurement, memory, and calibration across sessions.

Empirica tracks what AI knows, gates what it does, and compounds learning across session boundaries. It measures the gap between what AI predicts and what's true — making AI agents measurably more reliable.

Training & Guides | CLI Reference | Architecture

Important: Empirica is an AI measurement framework. It has no cryptocurrency, token, coin, or blockchain component. Any token using the Empirica name (including "$EMPIRICA" on Solana) is unauthorized and not affiliated with this project or Empirica AI GmbH.


The Problem

AI coding agents today have no self-awareness about what they know:

  • Forgets between sessions — same questions, same dead ends, every time
  • Acts before understanding — edits your code without knowing the architecture
  • Can't tell you when it's guessing — no distinction between knowledge and confabulation
  • No audit trail — reasoning evaporates with the context window

What Empirica Does

Capability What You Experience
Measures before acting AI investigates your codebase before touching it. The Sentinel gate blocks edits until understanding is demonstrated
Remembers across sessions Findings, dead-ends, and learnings persist in a 4-layer memory system. Session 3 starts where Session 2 left off
Prevents confident mistakes The CHECK gate uses domain-aware thresholds scaled by criticality — cybersec/high is stricter than default/low
Shows confidence in real-time Live statusline in your terminal: [empirica] ⚡94% ↕70% │ 🎯3 │ POST 🔍92% │ K:95% C:92%
Calibrates against reality Three-vector model: self-assessed, observed (from deterministic checks), and AI-reasoned grounded state with rationale. Domain compliance loops iterate until all checks pass
Tracks your codebase Temporal entity model auto-extracts functions, classes, and imports from every file edit — the AI knows what's alive and what's stale
Works through natural language You describe tasks normally. The AI operates the measurement system automatically

How You Use It

You talk to your AI normally. Empirica works in the background:

You:      "Fix the authentication bug in the login flow"

Empirica: [AI investigates → logs findings → passes Sentinel gate → implements fix → measures learning]

You see:  ⚡87% ↕70% │ 🎯1 │ POST 🔍85% │ K:88% C:82% │ Δ +K

You direct. The AI measures.

Empirica's CLI has 150+ commands spanning investigation, measurement, calibration, and memory — like a cockpit instrument panel. You don't need to learn any of them. The AI reads the instruments, operates the controls, and reports back in natural language. The statusline gives you the flight data at a glance.

For power users, direct CLI access is always available: empirica goals-list, empirica calibration-report, empirica project-search --task "...", and more.

Learn the full workflow: getempirica.com has interactive training, guides, and deep explanations of every concept.


Quick Start

Install + Claude Code (Recommended)

pip install empirica
empirica setup-claude-code

Then just start working. The hooks, Sentinel, system prompt, statusline, and MCP server are all configured automatically. See Claude Code Setup for details.

Already have Claude Code configured? Use --force to replace your default Claude Code settings with Empirica's epistemic hooks. Without --force, setup only writes files that don't already exist — so if you've already used Claude Code, the default internals stay in place and Empirica's hooks won't activate.

empirica setup-claude-code --force

--force replaces hooks in settings.json but only removes Empirica's own hooks — hooks from other plugins (Railway, Superpowers, etc.) are preserved.

Alternative Installation Methods

Homebrew (macOS)
brew tap nubaeon/tap
brew install empirica
empirica setup-claude-code
Docker
# Security-hardened Alpine image (~276MB, recommended)
docker pull nubaeon/empirica:1.9.5-alpine

# Standard image (Debian slim, ~414MB)
docker pull nubaeon/empirica:1.9.5

# Run
docker run -it -v $(pwd)/.empirica:/data/.empirica nubaeon/empirica:1.9.5 /bin/bash
Manual / Other AI Platforms
pip install empirica
pip install empirica-mcp        # MCP Server (for Cursor, Cline, etc.)
cd your-project && empirica project-init

The CLI works standalone on any platform. The full epistemic workflow (epistemic transactions, Sentinel, calibration) requires loading the system prompt into your AI — the easiest path is empirica setup-claude-code, which wires the lean prompt into ~/.claude/empirica-system-prompt.md and references it from your ~/.claude/CLAUDE.md. See Claude Code Setup for details.

First Session

empirica onboard   # Interactive walkthrough of the full workflow

Or just start working — with Claude Code hooks active, the AI manages the epistemic workflow automatically.


The Measurement Architecture

Empirica works through nested abstraction layers:

Plan
 └── Transaction 1 (Goal A)
      ├── NOETIC: investigate, search, read → findings, unknowns, dead-ends
      ├── CHECK: Sentinel gate → proceed / investigate more
      ├── PRAXIC: implement, write, commit → goals completed
      └── POSTFLIGHT: measure learning delta → persists to memory
 └── Transaction 2 (Goal B, informed by T1's findings)
      └── ...

Plans decompose into transactions — one per goal or Claude Code task. Each transaction is a noetic-praxic loop: investigate first (noetic), then act (praxic), with the Sentinel gating the transition. Along the way, the AI collects and reads artifacts (findings, unknowns, assumptions, dead-ends, decisions) while using semantic search to surface relevant epistemic patterns and anti-patterns from the project's history. Top artifacts are ranked by confidence and fed into each project's MEMORY.md as a hot cache.

The Epistemic Transaction Cycle

PREFLIGHT ────────► CHECK ────────► POSTFLIGHT
    │                 │                  │
 Baseline         Sentinel           Learning
 Assessment        Gate               Delta
    │                 │                  │
 "What do I      "Am I ready      "What did I
  know now?"      to act?"         learn?"

PREFLIGHT: AI assesses its knowledge state before starting work. CHECK: Sentinel gate validates readiness before allowing code edits. POSTFLIGHT: AI measures what it learned, creating a delta that persists.


Live Statusline

With Claude Code hooks enabled, you see the AI's epistemic state in real-time:

[empirica] ⚡94% ↕70% │ 🎯3 ❓12/5 │ POST 🔍92% │ K:95% C:92% │ Δ +K +C
Signal Meaning
⚡94% Overall epistemic confidence
↕70% Sentinel threshold (know gate) — user-facing only
🎯3 ❓12/5 Open goals (3), unknowns (12 total, 5 blocking)
POST 🔍92% Transaction phase + work state (🔍 investigating / 🔨 acting) with composite score
K:95% C:92% Knowledge and Context vectors (color-coded by gap to threshold)
Δ +K +C Learning delta (POSTFLIGHT only) — which vectors improved

The 13 Epistemic Vectors

These vectors emerged from 600+ real working sessions across multiple AI systems. They measure the dimensions that consistently predict success or failure in complex tasks.

Tier Vector What It Measures
Gate engagement Is the AI actively processing or disengaged?
Foundation know Domain knowledge depth
do Execution capability
context Access to relevant information
Comprehension clarity How clear is the understanding?
coherence Do the pieces fit together?
signal Signal-to-noise in available information
density Information richness
Execution state Current working state
change Rate of progress/change
completion Task completion level
impact Significance of the work
Meta uncertainty Explicit doubt tracking

Deep dive: Epistemic Vectors Explained


How It Works With Claude Code

Empirica doesn't replace or reinvent anything Claude Code already does. Claude Code owns tasks, plans, memory, and projects. Empirica adds the measurement layer on top:

Claude Code Does Empirica Adds
Task management Epistemic goals with measurable completion
Plan mode Investigation phase with Sentinel gating — no edits until understanding is verified
MEMORY.md Auto-curated hot cache ranked by epistemic confidence
Context window 4-layer memory that survives compaction and persists across sessions
Code editing Grounded calibration — was the AI's confidence justified by test results?
Subagent spawning Bounded autonomy with delegated work counting and budget tracking

The result: Claude Code's native capabilities, enhanced with measurement, gating, and calibration feedback that compounds over time.


Platform Support

Platform Integration Level What You Get
Claude Code Full (production) Hooks, Sentinel gate, skills, agents, statusline, MCP
Cursor, Cline MCP server Epistemic transaction workflow, memory, calibration via MCP tools
Gemini CLI, Copilot Experimental System prompt + CLI
Any AI CLI + prompt Full measurement via CLI commands and system prompt

Documentation & Training

Resource What It Covers
getempirica.com Training course, interactive guides, deep explanations
Natural Language Guide How to collaborate with AI using Empirica
Getting Started First-time setup and concepts
CLI Reference All 150+ commands documented
Architecture Technical reference for contributors
Claude Code Setup Install + system prompt + plugin wiring

The Empirica Ecosystem

Project Description Status
Empirica Core measurement system — epistemic transactions, Sentinel, calibration, 13 vectors Open source
Empirica Iris Epistemic browser automation with SVG spatial indexing — Sentinel gating for visual interactions Open source
Docpistemic Epistemic documentation coverage assessment — know what your docs know Open source
Breadcrumbs Survive context compacts with git notes — dead simple session continuity Open source
Empirica Cortex Cross-project intelligence layer — serves verified predictions and accumulated learnings to condition future work Proprietary
Empirica Workspace Entity Knowledge Graph, Epistemic Prompt Engine, CRM, portfolio dashboard Proprietary

Building something with Empirica? Open an issue to get listed.


What's New in 1.9.5

Empirica's first CI/CD harness. Three GitHub Actions workflows shipped: ci.yml (ruff + pyright + pytest matrix on Python 3.11 + 3.13

  • empirica compliance-report + pip-audit), release.yml (tag-triggered PyPI publishing via OIDC trusted publishers + Docker + Homebrew tap auto-update), dependency-scan.yml (weekly pip-audit + Dependabot grouped updates). docs/architecture/CI_CD.md documents the full setup including OIDC trusted-publisher configuration steps. Compliance score on CI: 1.0 (8/8 deterministic checks).

MCP / CLI parity for --visibility + --epistemic-source. All 6 mcp__empirica__*_log tools (finding/unknown/deadend/mistake/assumption/ decision) now expose both flags as enum params. The cross-Claude intelligence-sharing discipline (visibility ∈ {public, shared, local}, epistemic_source ∈ {intuition, search, mixed}) is enforceable through either MCP or bash CLI — no more dropping to bash to tag through the right interface.

Cross-project artifact sharing taught. The --visibility flag and project-search --global have been available for releases, but nothing in the system prompt or docs taught AIs to use them as a coherent sharing workflow. v1.9.5 closes that gap:

  • New signal→action rows in the lean system prompt's COLLABORATIVE MODE table: cross-codebase finding → --visibility shared, starting work on a new topic → project-search --global first, cross-project log → --project-id <name>
  • New "Visibility (push side)" section in docs/reference/api/CROSS_PROJECT.md with a when-to-use-which matrix (local for tactical, shared for ecosystem patterns, public for security/reusable lessons)
  • Honest scope caveat in both surfaces: v1.9.5 --global only hits the global_learnings Qdrant collection; the richer per-project walk + push-based auto-surface at project-bootstrap are deferred goals

Cortex creds via ~/.empirica/credentials.yaml. The browser extension saves cortex url + api_key to chrome.storage; v1.9.5 wires the CLI equivalent. A cortex: block in ~/.empirica/credentials.yaml is now picked up by projects-bulk-register, source-archive Cortex sync, and POSTFLIGHT /v1/sync push. Precedence: CLI flags → env vars → credentials file. Saves having to export CORTEX_API_KEY=... in every shell.

projects-bulk-register simplified — sources from registry.yaml. The command was over-engineered through accumulated handoff iterations. Mid-cycle reset: registry.yaml (added in 1.9.3 for the daemon multi- project work) is already the user's curated set, so bulk-register now reads from it directly. No more Cortex /v1/collections round-trip to compute an intersection at command time — the curation happens once when you run projects-discover --register, and bulk-register just syncs what you've curated.

  • Default: reads ~/.empirica/registry.yaml
  • --from-discovered: opt-in to read the raw scanner output (discovered_projects.yaml) for "register everything I have"
  • --force-metadata-update: still kept — sets body flag for Cortex's safe-update of existing rows
  • Removed: --only-existing flag, the intersection-fetch logic (~80 LOC, 7 tests). The intersection now happens at curation, not at sync time.
  • POSTFLIGHT /v1/sync payload also enriched with name + repo_url so Cortex's auto-create on unknown project_ids no longer seeds rows with name=<UUID>, repo_url="" (EC-2 root cause from the v0.7.8 handoff).

source-archive Cortex sync. When CORTEX_REMOTE_URL + CORTEX_API_KEY are set, archiving locally now also calls Cortex's DELETE /v1/sources/{id}. Best-effort — failures never block local archive; status surfaces in response as {"cortex": {"synced": true, "status": 200}}.

workflow_commands.py split. 3933 LOC → 4 focused modules (_workflow_shared 612 + _workflow_preflight 747 + _workflow_check 1103 + _workflow_postflight 1431) plus a 61-LOC re-export shim. Largest single-file refactor in the codebase. External imports preserved.

empirica-mcp bootstrapped 319 tests. Three new test files cover the _build_cli_command / _resolve_cwd / _err_text helpers from the v1.9.3 refactor, _build_tool_schema branches, and TOOL_REGISTRY integrity (parametrized over every entry). Plus empirica-mcp/call_tool() refactored from D27 → C14 cyclomatic complexity, dropping a noqa: C901.

Internal-only docs removed from public tree. Audit pass found 2 internal docs (CHAT_OVERNIGHT_PLAN.md — David's autonomous-build brief, and PROMPT_FOR_EMPIRICA_CLAUDE_source_aware_sentinel.md — AI-to-AI handoff prompt) shipped publicly. Moved to gitignored .empirica/notes/historical/. Also fixed 3 hardcoded /home/yogapad/... paths (docker-compose.yml, diagnose_ecodex.py, KNOWN_ISSUES.md). Added forward-looking .gitignore patterns: docs/**/PROMPT_FOR_*.md and docs/**/*OVERNIGHT*PLAN*.md.

5 broken docs links fixed — references to gitignored docs/specs/ and docs/research/ drafts in committed .md files converted to plain-text refs (links resolved locally where the targets exist, broke on fresh CI checkout where they don't).

Full suite 2320 passed, 4 skipped (release-gate run).

What's New in 1.9.0

Goal-criterion bridge — quality gates that auto-evaluate

  • criterion_evaluators package — validation_method-keyed registry. Goals declare quality_gate:<metric>@<op>:<threshold> and the bridge routes to the right evaluator at POSTFLIGHT.
  • EvidenceMetricEvaluator — auto-evaluates any criterion whose metric matches an evidence bundle key (test pass-rate, ruff violations, stylometry drift, etc.).
  • Typed criterion parsergoals-create --success-criteria "quality_gate:test_pass_rate@>=:0.95" parses to typed CriterionDeclaration.

Stylometric drift collector — voice consistency for outreach work

  • 12 prosodic markers (contractions, MTLD, sentence-length stdev, etc.)
  • Voice fingerprints at ~/.empirica/voice/<name>.fingerprint.json
  • Drift direction inference (formal_pull / informal_pull / mixed / within_tolerance)

Content-aware source provenance nudge — fires at moment of artifact creation when text shows citation but no --source. Closes 0% adoption gap.

Bulk project-link CLIprojects-discover / projects-list / projects-bulk-register (Cortex-dependent).

Live-scan semantic indexsemantic_index.json regenerates when source docs are newer than the cache.

Sentinel quote-aware shell parsing — false-positive > in quoted code fixed (_has_dangerous_redirects now uses _contains_outside_quotes).

Template version parameterization (Philipp #100)CLAUDE.md and empirica-system-prompt-lean.md use {{ empirica_version }} and {{ generated_date }} placeholders. Drift cannot recur.

Documentation refreshUPGRADE_TO_1.9.md (replaces 1.7), full rewrite of PROJECT_SWITCHING_FOR_AIS.md, TMUX_MULTI_PANE_GUIDE.md cockpit section.

What's New in 1.8.20

  • empirica commit-context <sha> (new CLI). Aggregates artifacts
  • --depth N recursive walker. Walks edges from each artifact's
  • Inline edge declaration on individual *-log commands. All six
  • edge_density_nudge — POSTFLIGHT retrospective +
  • sources_discipline_nudge — same shape, counts artifacts
  • --status {planned|in_progress|completed|all|drift} flag
  • drift mode surfaces rows where the status text and
  • Default open count now uses is_completed = 0 as the canonical

What's New in 1.8.17

  • Listener subsystem — sister to cron loops, event-driven not scheduled. empirica listener register/heartbeat/list + cockpit E binding + project.yaml install hook.
  • Mechanical pause for loops — pause now cancels the next-fire CronCreate token so paused really means silent (no token bleed).
  • Cockpit sweep — domain·criticality chip per row, compliance panel with green/yellow/red glyph, services panel for scanner snapshots.

What's New in 1.8.16

  • #95 root-cause cluster closed — Cortex sync reads project_id from session row (no CWD); _run_grounded_verification accepts project_path; resolve_project_id raises ProjectNotFoundError instead of sys.exit(1). SystemExit-walks-through-Exception hazard closed at the source.
  • Per-project compliance.yaml — projects can skip_checks, declare extra_checks with regulatory mapping, override repo_hygiene sub-checks. Non-CLI/server projects no longer fail tech_docs.
  • KNOWN_ISSUES 11.29 + 11.30 — instance_isolation audit-trail entries for the subagent CLI bleed fix and the SystemExit propagation chain.

What's New in 1.8.15

  • Validate-and-heal session.project_id at session boundaries — catches the ghost-project_id pattern (cross-project --resume, ambiguous folder_name match, tmux pane reuse). Heals at post-compact CONTINUE_TRANSACTION + NEW_SESSION_PREFLIGHT and at session-init resume. Workspace.db trajectory_path is the canonical lookup — never folder_name (no 11.10/11.27 regression).
  • Voice CLIempirica voice list / show / apply loads prosodic profiles for outreach drafting. Profiles in ~/.empirica/voice/*.yaml with project-local override at .empirica/voice/. Voice samples themselves stay in Cortex/Qdrant; this CLI is the calling surface.
  • PREFLIGHT voice_guidance block — when work_type=comms or the new voice field/--voice flag is set, response includes voice tendencies + anti-patterns scoped to platform register (mirrors the noetic_guidance pattern).
  • Subagent CLI bleed fix (#95 Issue 1)subagent-start now writes ~/.empirica/active_work_<subagent_uuid>.json with is_subagent: true so the subagent's CLI calls resolve to their own child_session_id instead of falling through to the parent's via TTY. sentinel-gate._detect_subagent reads the flag. subagent-stop cleans up.
  • POSTFLIGHT pipeline restructure (#95 Issue 3) — Stage 0 pre-validates session row + project_id BEFORE any state mutation; failure → early return with loop_state: "open". Stages 5-7 wrapped in _soft_run — failures accumulate into result["warnings"] without erasing the closed-loop reflex. No more half-success.

What's New in 1.8.14

  • Notify dispatcher — single CLI verb (empirica notify emit/config/ backends/test) every loop and hook calls. Three v1 backends (stdout, rotating JSONL log, ntfy) with first-match-wins routing and fail-loud fallback to stdout when a backend isn't configured. Always-on audit at ~/.empirica/notify-dispatcher.jsonl. Cockpit + TUI surface 5 most recent emits, backend status, 24h fallback count, and a failure banner. See docs/architecture/NOTIFY.md.
  • Project-scoped TUI notifications — per-instance notifications strip now reads ~/.empirica/enp/pending.json (the file the ENP watcher actually writes). Top-bar ⊕N shows total unacked across all projects.
  • empirica goals-prune — bulk goal cleanup with four modes (test-pollution, planned, auto-stale, duplicates). Dry-run by default.
  • Empirica Cockpit — multi-instance state visibility + per-instance controls. empirica status [--all] overview, empirica tui interactive Textual app, empirica sentinel|loop|instance subcommand groups. See docs/architecture/COCKPIT.md.
  • Loop exponential backoff — empty fires lengthen the gap; found/fail snap back to base (15m → 30m → 1h → 2h → 4h cap).
  • noetic-batch CLI primitive — bundles N reads/greps/globs/investigate into one Sentinel-noetic call.

Sentinel Reframe (1.8.0)

The Sentinel is a compliance loop coordinator. Deterministic services produce information; the AI synthesizes the grounded epistemic state.

  • Domain Registry(work_type, domain, criticality) tuples map to compliance checklists. 4 built-in domains: default, remote-ops, cybersec, docs. CLI: domain-list, domain-show, domain-resolve
  • Domain-aware CHECK gate — uncertainty threshold scales by criticality. cybersec/high is stricter than default/low
  • Three-vector modelself_assessed, observed (from deterministic checks), and AI-reasoned grounded state with rationale
  • Compliance loop — POSTFLIGHT runs domain checklist, reports status, advises on follow-up for failed checks
  • Check-outcome Brier — AI predicts P(check passes), Brier measures against actual outcomes. Falsifiable calibration
  • Real check runners — pytest, ruff, and git status execute as subprocess checks (not stubs)
  • Test isolation — tests no longer pollute live sessions via TMUX_PANE inheritance

Previous Highlights (1.7.0–1.7.13)

  • Empirica Constitution — 12-section governance framework routing situations to mechanisms
  • Epistemic Persistence Protocol (EPP) — Calibrated position-holding under pushback, replacing AAP
  • Lean Core Prompt — 81% reduction in always-loaded context. setup-claude-code --lean
  • Cross-Project Search--global searches ALL projects' Qdrant collections
  • Cross-Project Artifact Writingfinding-log --project-id <name> writes to another project
  • Plugin Renamedempirica-integrationempirica. Run setup-claude-code --force
  • Brier Score Calibration — Proper scoring rule with dynamic thresholds
  • Profile Managementprofile-sync, profile-prune, profile-status

Privacy & Data

Your data stays local:

  • .empirica/ — Local SQLite database (gitignored by default)
  • .git/refs/notes/empirica/* — Epistemic checkpoints (local unless you push)
  • Qdrant runs locally if enabled

No cloud dependencies. No telemetry. Your epistemic data is yours.


Community & Support


License

MIT License — see LICENSE for details.


Author: David S. L. Van Assche Version: 1.9.5

Turtles all the way down — built with its own epistemic framework, measuring what it knows at every step.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empirica-1.9.5.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

empirica-1.9.5-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file empirica-1.9.5.tar.gz.

File metadata

  • Download URL: empirica-1.9.5.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for empirica-1.9.5.tar.gz
Algorithm Hash digest
SHA256 8b7ee23e9a1a9da8c52bfdcec599f0aee7891afbeeab2647d6f0ad037f16edd7
MD5 40ee1d1bcbe226363ad75edd7267fad3
BLAKE2b-256 c2eb07f41a4052b4015de4a32776690c8a47538131ef03debb63df8a32372f5b

See more details on using hashes here.

File details

Details for the file empirica-1.9.5-py3-none-any.whl.

File metadata

  • Download URL: empirica-1.9.5-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for empirica-1.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a5ed8fbdd9dc8620806f048fb78abe34156b0806f246ac891d768bc8be595ee9
MD5 916f3950cef8f4198be85ed66de42319
BLAKE2b-256 cef2e3e4df629dec957699741d948ee292ce8b0f42b8b19cd6eeabb083e2c7e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page