Genuine AI epistemic self-assessment framework - Universal interface for single AI tracking
Project description
Empirica
Teaching AI to know what it knows—and what it doesn't
What is Empirica?
Empirica is an epistemic self-awareness framework that enables AI agents to genuinely understand the boundaries of their own knowledge. Instead of producing confident-sounding responses regardless of actual understanding, AI agents using Empirica can accurately assess what they know, identify gaps, and communicate uncertainty honestly.
The core insight: AI systems today lack functional self-awareness. They can't reliably distinguish between "I know this well" and "I'm guessing." Empirica provides the cognitive infrastructure to make this distinction measurable and actionable.
Why This Matters
The Problem: AI agents exhibit "confident ignorance"—they generate plausible-sounding responses about topics they don't actually understand. This leads to:
- Hallucinated facts presented as truth
- Wasted time investigating already-explored dead ends
- Knowledge lost between sessions
- No way to tell when an AI is genuinely confident vs. bluffing
The Solution: Empirica introduces epistemic vectors—quantified measures of knowledge state that AI agents track in real-time. These vectors emerged from observing what information actually matters when assessing cognitive readiness.
The 13 Foundational Vectors
These vectors weren't designed in a vacuum. They emerged from 600+ real working sessions across multiple AI systems (Claude, GPT-4, Gemini, Qwen, and others), with Claude serving as the primary development partner due to its reasoning capabilities.
The pattern proved universal: regardless of which AI system we tested, these same dimensions consistently predicted success or failure in complex tasks.
The Vector Space
| Tier | Vector | What It Measures |
|---|---|---|
| Gate | engagement |
Is the AI actively processing or disengaged? |
| Foundation | know |
Domain knowledge depth (0.7+ = ready to act) |
do |
Execution capability | |
context |
Access to relevant information | |
| Comprehension | clarity |
How clear is the understanding? |
coherence |
Do the pieces fit together? | |
signal |
Signal-to-noise in available information | |
density |
Information richness | |
| Execution | state |
Current working state |
change |
Rate of progress/change | |
completion |
Task completion level | |
impact |
Significance of the work | |
| Meta | uncertainty |
Explicit doubt tracking (0.35- = ready to act) |
Why These Vectors?
Readiness Gate: Through empirical observation, we found that know ≥ 0.70 AND uncertainty ≤ 0.35 reliably predicts successful task execution. Below these thresholds, investigation is needed.
The Key Insight: The uncertainty vector is explicitly tracked because AI systems naturally underreport doubt. Making it a first-class metric forces honest assessment.
Applications Across Industries
While the vectors emerged from software development work, they map to any domain requiring knowledge assessment:
| Industry | Primary Vectors | Use Case |
|---|---|---|
| Software Development | know, context, uncertainty, completion | Code review, architecture decisions, debugging |
| Research & Analysis | know, clarity, coherence, signal | Literature review, hypothesis testing |
| Healthcare | know, uncertainty, impact | Diagnostic confidence, treatment recommendations |
| Legal | context, clarity, coherence | Case analysis, precedent research |
| Education | know, do, completion | Learning assessment, curriculum design |
| Finance | know, uncertainty, impact | Risk assessment, investment analysis |
Why Software Development First?
Software engineering provides an ideal testbed because:
- Measurable outcomes - Code either works or it doesn't
- Complex knowledge states - Requires synthesizing documentation, code, tests, and context
- Session continuity - Projects span days/weeks with context loss between sessions
- Multi-agent potential - Team collaboration benefits from shared epistemic state
Empirica was battle-tested here before expanding to other domains.
Quick Start
For End Users
Visit getempirica.com for the guided setup experience with tutorials and support.
For Developers
Install + Claude Code Integration (Recommended)
pip install empirica
empirica setup-claude-code
setup-claude-code is the one-command integration that installs everything Claude Code needs:
- Plugin —
empirica-integrationto~/.claude/plugins/local/(skills, agents, hooks, scripts) - Sentinel hooks — PreToolUse gates that block praxic tools (Edit/Write/Bash) until CHECK passes
- Session lifecycle hooks — SessionStart/SessionEnd for automatic session management, SubagentStart/Stop for delegation tracking, PreCompact for epistemic state persistence across context compaction
- System prompt — Empirica prompt as
@includereference in CLAUDE.md (preserves your existing instructions) - StatusLine — Live metacognitive signal in your terminal (confidence, phase, drift)
- MCP server — Installs
empirica-mcpand configures.claude/mcp.json - Semantic layer check — Detects Ollama + nomic-embed-text + Qdrant availability (optional but recommended for cross-session memory)
# Options
empirica setup-claude-code --force # Reinstall even if already present
empirica setup-claude-code --skip-mcp # Skip MCP server setup
empirica setup-claude-code --skip-claude-md # Keep existing system prompt
One-Line Installer (Alternative)
The installer handles pip install + Claude Code setup + demo project:
# Linux / macOS
curl -fsSL https://raw.githubusercontent.com/Nubaeon/empirica/main/scripts/install.py | python3 -
# Windows (PowerShell)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Nubaeon/empirica/main/scripts/install.py" -OutFile "install.py"
python install.py
Manual Installation
# Install from PyPI
pip install empirica
# Or with all features
pip install empirica[all]
# MCP Server (for Claude Desktop, Cursor, etc.)
pip install empirica-mcp
# Initialize in your project
cd your-project
empirica project-init
Note: The CLI tools work standalone, but the full epistemic workflow (CASCADE phases, calibration, Sentinel gates) requires the AI to have the system prompt loaded.
setup-claude-codehandles this automatically. For other AI platforms, see System Prompts for Copilot, Gemini, Qwen, and Roo Code.
Homebrew (macOS)
brew tap nubaeon/tap
brew install empirica
empirica setup-claude-code # Don't forget this step
Docker
# Standard image (Debian slim, ~414MB)
docker pull nubaeon/empirica:1.5.8
# Security-hardened Alpine image (~276MB, recommended)
docker pull nubaeon/empirica:1.5.8-alpine
# Run
docker run -it -v $(pwd)/.empirica:/data/.empirica nubaeon/empirica:1.5.8 /bin/bash
After Installation: Getting Started
Interactive Onboarding (Recommended)
empirica onboard
Walks you through the full workflow: CASCADE phases, 13 epistemic vectors, noetic/praxic phases, transaction discipline, goal tracking, calibration reports, and all CLI commands with examples.
Initialize Your Project
cd your-project
empirica project-init
empirica session-create --ai-id claude-code --output json
Then just start working — with Claude Code hooks active, the Sentinel automatically manages the epistemic workflow. Log findings as you discover them, create goals for your tasks, and let the measurement system track your learning.
Explore Documentation
# Search documentation semantically
empirica docs-explain --topic "epistemic vectors"
empirica docs-explain --topic "CASCADE workflow"
# List all available topics
empirica docs-list
Try the Demo Project
The one-line installer creates a demo project at ~/empirica-demo/:
cd ~/empirica-demo
cat WALKTHROUGH.md
Documentation
For Humans
Start here based on your role:
| Role | Start With | Then Read |
|---|---|---|
| End User | Getting Started | Empirica Explained Simply |
| Developer | Developer README | Claude Code Setup |
Documentation Structure:
docs/
├── human/ # Human-readable documentation
│ ├── end-users/ # Installation, concepts, troubleshooting
│ └── developers/ # Integration, system prompts, API
│ └── system-prompts/ # AI system prompts (Claude, Copilot, etc.)
│
└── architecture/ # Technical architecture (for AI context loading)
For AI Integration
If you're integrating Empirica into an AI system:
- System Prompts: docs/human/developers/system-prompts/
- MCP Server: empirica-mcp/ (Model Context Protocol integration)
- Architecture Docs: docs/architecture/ (AI-optimized technical reference)
Key Guides
| Guide | Purpose |
|---|---|
| CASCADE Workflow | The PREFLIGHT → CHECK → POSTFLIGHT loop |
| Epistemic Vectors Explained | Deep dive into all 13 vectors |
| CLI Reference | Complete command documentation |
| Storage Architecture | Four-layer data persistence |
How It Works
The CASCADE Workflow
Every significant task follows this loop:
PREFLIGHT ────────► CHECK ────────► POSTFLIGHT
│ │ │
│ │ │
Baseline Decision Learning
Assessment Gate Delta
│ │ │
"What do I "Am I ready "What did I
know now?" to act?" learn?"
PREFLIGHT: AI assesses its knowledge state before starting work. CHECK: Sentinel gate validates readiness (know ≥ 0.70, uncertainty ≤ 0.35). POSTFLIGHT: AI measures what it learned, creating a learning delta.
Learning Compounds Across Sessions
Session 1: know=0.40 → know=0.65 (Δ +0.25)
↓ (findings persisted)
Session 2: know=0.70 → know=0.85 (Δ +0.15)
↓ (compound learning)
Session 3: know=0.82 → know=0.92 (Δ +0.10)
Each session starts higher because learnings persist. No more re-investigating the same questions.
Live Metacognitive Signal
With Claude Code hooks enabled, you see epistemic state in your terminal:
[empirica] ⚡94% │ 🎯3 ❓12/5 │ POSTFLIGHT │ K:95% U:5% C:92% │ ✓ │ ✓ stable
What this tells you:
- ⚡94% — Overall epistemic confidence (⚡ high, 💡 good, 💫 uncertain, 🌑 low)
- 🎯3 ❓12/5 — Open goals (3) and unknowns (12 total, 5 blocking goals)
- POSTFLIGHT — CASCADE phase (PREFLIGHT → CHECK → POSTFLIGHT)
- K:95% U:5% C:92% — Knowledge, Uncertainty, Context scores
- ✓ / ⚠ / △ — Learning delta summary (net positive / net negative / neutral)
- ✓ stable — Drift indicator (✓ stable, ⚠ drifting, ✗ severe)
Built With Empirica
Projects using Empirica's epistemic foundations:
| Project | Description | Use Case |
|---|---|---|
| Docpistemic | Epistemic documentation system | Self-aware documentation that tracks what it explains well vs. poorly |
| Carapace | Defensive AI shell | Security-focused AI wrapper with epistemic safety gates |
| Building something with Empirica? Open an issue to get listed here. |
What's New in 1.5.9
- Sentinel File-Based Control — Sentinel enable/disable via
~/.empirica/sentinel_enabledfile flag. Dynamically settable without session restart (env vars required terminal restart) - Sentinel Bypass Fix — System prompt contained bare
exportcommands that Claudes would execute, disabling the Sentinel. Replaced with tables + "DO NOT execute" warnings - SessionStart Matcher Fix —
setup-claude-codegenerated invalid matchers (new|fresh, barecompact). Fixed to valid Claude Code values (startup,compact|resume) - MirrorDriftMonitor Removed — Vestigial drift detection superseded by grounded calibration pipeline. Removed
check-driftCLI command, MCP tool, and drift module (-562 lines) - Transaction Planning Skill —
/epistemic-transactionskill gains interactiveplan-transactionsmode: interview → explore → decompose → plan with estimated vectors → execute - Phantom Project Fix — Project ID resolution uses
project.yamlas authoritative source, preventing self-propagating phantom project IDs
Privacy & Data
Your data stays local:
.empirica/— Local SQLite database (gitignored by default).git/refs/notes/empirica/*— Epistemic checkpoints (local unless you push)- Qdrant runs locally if enabled
No cloud dependencies. No telemetry. Your epistemic data is yours.
Community & Support
- Website: getempirica.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions
License
MIT License — Maximum adoption, aligned with Empirica's transparency principles.
See LICENSE for details.
Author: David S. L. Van Assche Version: 1.5.8
Turtles all the way down — built with its own epistemic framework, measuring what it knows at every step.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file empirica-1.5.9.tar.gz.
File metadata
- Download URL: empirica-1.5.9.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee68a21d64c2ff003ef16f6ddf41810daa0b82362cd95e6a4fc9380ed3e30fd5
|
|
| MD5 |
14b31545a0fa174a9bf24048d9f8e836
|
|
| BLAKE2b-256 |
fc3eeedcc83ed526230468350ede7060c2aed8a0737818e154bccd82f95cf28e
|
File details
Details for the file empirica-1.5.9-py3-none-any.whl.
File metadata
- Download URL: empirica-1.5.9-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e58da946ae0768068bf8dd4ff73e5a6635ebcaf82ee10f7017ee821206bdbb6e
|
|
| MD5 |
280bbac0d6d63920ec4df623c9397ed6
|
|
| BLAKE2b-256 |
089ae1e6f9f87d5a671b094a10bf275b5394c2c864b82c2a1696b3295602bbe4
|