Skip to main content

Genuine AI epistemic self-assessment framework - Universal interface for single AI tracking

Project description

Empirica

We Gave AI a Mirror. Now It Measures What It Believes.

Version PyPI Python License

Empirica is an epistemic measurement system that makes AI agents measurably more reliable — tracking what they know, preventing action before understanding, and compounding learning across sessions.

Training & Guides | CLI Reference | Architecture


The Problem

AI coding agents today have no self-awareness about what they know:

  • Forgets between sessions — same questions, same dead ends, every time
  • Acts before understanding — edits your code without knowing the architecture
  • Can't tell you when it's guessing — no distinction between knowledge and confabulation
  • No audit trail — reasoning evaporates with the context window

What Empirica Does

Capability What You Experience
Measures before acting AI investigates your codebase before touching it. The Sentinel gate blocks edits until understanding is demonstrated
Remembers across sessions Findings, dead-ends, and learnings persist in a 4-layer memory system. Session 3 starts where Session 2 left off
Prevents confident mistakes The CHECK gate uses thresholds computed dynamically from calibration data before allowing action
Shows confidence in real-time Live statusline in your terminal: [empirica] ⚡94% ↕70% │ 🎯3 │ POST 🔍92% │ K:95% C:92%
Calibrates against reality Dual-track verification compares AI self-assessment against objective evidence — tests, git metrics, goal completion
Tracks your codebase Temporal entity model auto-extracts functions, classes, and imports from every file edit — the AI knows what's alive and what's stale
Works through natural language You describe tasks normally. The AI operates the measurement system automatically

How You Use It

You talk to your AI normally. Empirica works in the background:

You:      "Fix the authentication bug in the login flow"

Empirica: [AI investigates → logs findings → passes Sentinel gate → implements fix → measures learning]

You see:  ⚡87% ↕70% │ 🎯1 │ POST 🔍85% │ K:88% C:82% │ Δ +K

You direct. The AI measures.

Empirica's CLI has 150+ commands spanning investigation, measurement, calibration, and memory — like a cockpit instrument panel. You don't need to learn any of them. The AI reads the instruments, operates the controls, and reports back in natural language. The statusline gives you the flight data at a glance.

For power users, direct CLI access is always available: empirica goals-list, empirica calibration-report, empirica project-search --task "...", and more.

Learn the full workflow: getempirica.com has interactive training, guides, and deep explanations of every concept.


Quick Start

Install + Claude Code (Recommended)

pip install empirica
empirica setup-claude-code

Then just start working. The hooks, Sentinel, system prompt, statusline, and MCP server are all configured automatically. See Claude Code Setup for details.

Alternative Installation Methods

One-Line Installer
# Linux / macOS
curl -fsSL https://raw.githubusercontent.com/Nubaeon/empirica/main/scripts/install.py | python3 -

# Windows (PowerShell)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Nubaeon/empirica/main/scripts/install.py" -OutFile "install.py"
python install.py
Homebrew (macOS)
brew tap nubaeon/tap
brew install empirica
empirica setup-claude-code
Docker
# Security-hardened Alpine image (~276MB, recommended)
docker pull nubaeon/empirica:1.6.18-alpine

# Standard image (Debian slim, ~414MB)
docker pull nubaeon/empirica:1.6.18

# Run
docker run -it -v $(pwd)/.empirica:/data/.empirica nubaeon/empirica:1.6.18 /bin/bash
Manual / Other AI Platforms
pip install empirica
pip install empirica-mcp        # MCP Server (for Cursor, Cline, etc.)
cd your-project && empirica project-init

The CLI works standalone on any platform. The full epistemic workflow (epistemic transactions, Sentinel, calibration) requires loading the system prompt into your AI. See System Prompts for Claude, Copilot, Gemini, Qwen, and Roo Code.

First Session

empirica onboard   # Interactive walkthrough of the full workflow

Or just start working — with Claude Code hooks active, the AI manages the epistemic workflow automatically.


The Measurement Architecture

Empirica works through nested abstraction layers:

Plan
 └── Transaction 1 (Goal A)
      ├── NOETIC: investigate, search, read → findings, unknowns, dead-ends
      ├── CHECK: Sentinel gate → proceed / investigate more
      ├── PRAXIC: implement, write, commit → goals completed
      └── POSTFLIGHT: measure learning delta → persists to memory
 └── Transaction 2 (Goal B, informed by T1's findings)
      └── ...

Plans decompose into transactions — one per goal or Claude Code task. Each transaction is a noetic-praxic loop: investigate first (noetic), then act (praxic), with the Sentinel gating the transition. Along the way, the AI collects and reads artifacts (findings, unknowns, assumptions, dead-ends, decisions) while using semantic search to surface relevant epistemic patterns and anti-patterns from the project's history. Top artifacts are ranked by confidence and fed into each project's MEMORY.md as a hot cache.

The Epistemic Transaction Cycle

PREFLIGHT ────────► CHECK ────────► POSTFLIGHT
    │                 │                  │
 Baseline         Sentinel           Learning
 Assessment        Gate               Delta
    │                 │                  │
 "What do I      "Am I ready      "What did I
  know now?"      to act?"         learn?"

PREFLIGHT: AI assesses its knowledge state before starting work. CHECK: Sentinel gate validates readiness before allowing code edits. POSTFLIGHT: AI measures what it learned, creating a delta that persists.


Live Statusline

With Claude Code hooks enabled, you see the AI's epistemic state in real-time:

[empirica] ⚡94% ↕70% │ 🎯3 ❓12/5 │ POST 🔍92% │ K:95% C:92% │ Δ +K +C
Signal Meaning
⚡94% Overall epistemic confidence
↕70% Sentinel threshold (know gate) — user-facing only
🎯3 ❓12/5 Open goals (3), unknowns (12 total, 5 blocking)
POST 🔍92% Transaction phase + work state (🔍 investigating / 🔨 acting) with composite score
K:95% C:92% Knowledge and Context vectors (color-coded by gap to threshold)
Δ +K +C Learning delta (POSTFLIGHT only) — which vectors improved

The 13 Epistemic Vectors

These vectors emerged from 600+ real working sessions across multiple AI systems. They measure the dimensions that consistently predict success or failure in complex tasks.

Tier Vector What It Measures
Gate engagement Is the AI actively processing or disengaged?
Foundation know Domain knowledge depth
do Execution capability
context Access to relevant information
Comprehension clarity How clear is the understanding?
coherence Do the pieces fit together?
signal Signal-to-noise in available information
density Information richness
Execution state Current working state
change Rate of progress/change
completion Task completion level
impact Significance of the work
Meta uncertainty Explicit doubt tracking

Deep dive: Epistemic Vectors Explained


How It Works With Claude Code

Empirica doesn't replace or reinvent anything Claude Code already does. Claude Code owns tasks, plans, memory, and projects. Empirica adds the measurement layer on top:

Claude Code Does Empirica Adds
Task management Epistemic goals with measurable completion
Plan mode Investigation phase with Sentinel gating — no edits until understanding is verified
MEMORY.md Auto-curated hot cache ranked by epistemic confidence
Context window 4-layer memory that survives compaction and persists across sessions
Code editing Grounded calibration — was the AI's confidence justified by test results?
Subagent spawning Bounded autonomy with delegated work counting and budget tracking

The result: Claude Code's native capabilities, enhanced with measurement, gating, and calibration feedback that compounds over time.


Platform Support

Platform Integration Level What You Get
Claude Code Full (production) Hooks, Sentinel gate, skills, agents, statusline, MCP
Cursor, Cline MCP server Epistemic transaction workflow, memory, calibration via MCP tools
Gemini CLI, Copilot Experimental System prompt + CLI
Any AI CLI + prompt Full measurement via CLI commands and system prompt

Documentation & Training

Resource What It Covers
getempirica.com Training course, interactive guides, deep explanations
Natural Language Guide How to collaborate with AI using Empirica
Getting Started First-time setup and concepts
CLI Reference All 150+ commands documented
Architecture Technical reference for contributors
System Prompts AI prompts for Claude, Copilot, Gemini, Qwen, Roo

The Empirica Ecosystem

Project Description Status
Empirica Core measurement system — epistemic transactions, Sentinel, calibration, 13 vectors Open source
Empirica Iris Epistemic browser automation with SVG spatial indexing — Sentinel gating for visual interactions Open source
Docpistemic Epistemic documentation coverage assessment — know what your docs know Open source
Breadcrumbs Survive context compacts with git notes — dead simple session continuity Open source
Empirica Workspace Entity Knowledge Graph, Epistemic Prompt Engine, CRM, portfolio dashboard Proprietary

Building something with Empirica? Open an issue to get listed.


What's New in 1.6.18

  • InstanceResolver Fix — Fixed UnboundLocalError affecting users who provide session_id explicitly. 46 redundant local re-imports removed across 7 CLI handler files
  • Windows UTF-8 Subprocess Fixrun_empirica_subprocess() helper forces UTF-8 encoding for Empirica-on-Empirica subprocess calls on Windows. Contributed by @kars85
  • Bootstrap NoneType Guard — Fixed TypeError: '>' not supported between NoneType and float in drift/delta calculations during project-bootstrap. Reported by @kars85
  • Sentinel Allow List — Added gh pr diff to noetic-safe Bash commands

Previous Highlights (1.6.11–1.6.18)

  • Brier Score Calibration — Replaced MAE with proper Brier score (Murphy 1973). Miscalibration raises thresholds. Anti-gaming: AI gets directional-only feedback
  • Statusline Redesign — Threshold indicator, investigating/acting phase, color-coded K/C vectors, learning deltas
  • Instance IsolationInstanceResolver unified API, headless/interactive mode split, DB-based file cleanup
  • Temporal Entity Model — Codebase entities tracked with temporal validity, auto-extracted from file edits
  • Windows Support — Project-embed retrieval, Qdrant collection creation, Ollama retry. Contributed by @kars85

Privacy & Data

Your data stays local:

  • .empirica/ — Local SQLite database (gitignored by default)
  • .git/refs/notes/empirica/* — Epistemic checkpoints (local unless you push)
  • Qdrant runs locally if enabled

No cloud dependencies. No telemetry. Your epistemic data is yours.


Community & Support


License

MIT License — see LICENSE for details.


Author: David S. L. Van Assche Version: 1.6.18

Turtles all the way down — built with its own epistemic framework, measuring what it knows at every step.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empirica-1.6.18.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

empirica-1.6.18-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file empirica-1.6.18.tar.gz.

File metadata

  • Download URL: empirica-1.6.18.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for empirica-1.6.18.tar.gz
Algorithm Hash digest
SHA256 45137efc511480b2f7e96e4ba8a91826abd06ac8d95d3e9ab29fbebf5747dc3f
MD5 ebe6652d4fabf2ce83b72a644806ee88
BLAKE2b-256 4f969b8aa6d5fb7ea3b05559cf47af0d385bb689a7cdc31da1bb1d0e820ec4d6

See more details on using hashes here.

File details

Details for the file empirica-1.6.18-py3-none-any.whl.

File metadata

  • Download URL: empirica-1.6.18-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for empirica-1.6.18-py3-none-any.whl
Algorithm Hash digest
SHA256 25fddd096c0a70692e79dd7567423288f84a5d3a7761febdf01e2e80006449ae
MD5 f4166ede797f3e007a9604fc48c619a5
BLAKE2b-256 915f277142a0e55db20d9ee0b59a841f743516c78e2006388ed3a3f96cbb6e9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page