Adversarial AI validation for autonomous systems - catch reasoning flaws before they become failures
Project description
MiniCrit-7B: Adversarial AI Validation for Autonomous Systems
๐ก๏ธ Catch AI reasoning flaws before they become failures
Quick Start โข MCP Integration โข API โข Training โข Benchmarks
๐ฏ The Problem
Autonomous AI systems fail silently. They produce confident-sounding outputs with hidden flawsโoverconfidence, missing risks, logical errors, hallucinations. By the time you notice, it's too late.
Traditional testing catches bugs. MiniCrit catches bad reasoning.
๐ก The Solution
MiniCrit is a specialized AI "devil's advocate" that validates reasoning before actions are taken. It integrates with any AI system via MCP (Model Context Protocol) to provide real-time adversarial critique.
Your AI Agent โ MiniCrit Validation โ Safer Decisions
โ โ โ
"Buy AAPL, "Overconfidence: Execute with
95% confident" only 2 data points, reduced size
missing earnings or skip
risk"
๐ Results
35%Flawed Output Reduction |
+0.28Sharpe Ratio Improvement |
38,000+Live Validations |
<50msInference Latency |
| Metric | MiniCrit-7B | GPT-4 | Claude-3 |
|---|---|---|---|
| Flaw Detection F1 | 0.82 | 0.75 | 0.78 |
| False Positive Rate | 12% | 18% | 15% |
| Latency | 45ms | 850ms | 620ms |
| Cost per 1K calls | $0.00 | $30 | $15 |
๐ Quick Start
Option 1: pip install (Recommended)
pip install minicrit
from minicrit import MiniCrit
# Initialize (downloads model automatically)
critic = MiniCrit()
# Validate reasoning
result = critic.validate(
"Stock will rise because it rose yesterday",
domain="trading"
)
print(result.valid) # False
print(result.severity) # "high"
print(result.critique) # "This reasoning exhibits recency bias..."
print(result.flags) # ["overconfidence", "insufficient_evidence"]
Option 2: CLI
# Single validation
minicrit "Buy AAPL, 95% confident" --domain trading
# From file
minicrit --file rationales.txt --output results.json
# JSON output
minicrit "MACD crossover signals buy" --json
Installation Extras
# Core only
pip install minicrit
# With MCP server support
pip install minicrit[mcp]
# With training utilities
pip install minicrit[training]
# Everything
pip install minicrit[all]
Option 3: Docker
cd docker
docker-compose up -d
curl http://localhost:8000/critique \
-H "Content-Type: application/json" \
-d '{"rationale": "Buy signal based on MACD crossover", "domain": "trading"}'
Option 4: MCP (Claude Desktop / Claude Code)
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"minicrit": {
"command": "python3",
"args": ["/path/to/MiniCrit-7B/src/mcp/server.py"],
"env": {
"MINICRIT_ADAPTER": "wmaousley/MiniCrit-7B",
"MINICRIT_BASE_MODEL": "Qwen/Qwen2-7B-Instruct"
}
}
}
}
Then in Claude: "Use validate_reasoning to check: Buy AAPL, RSI shows oversold"
๐ MCP Integration
MiniCrit implements the Model Context Protocol (MCP)โthe industry standard for AI tool integration, backed by Anthropic, OpenAI, Google, and Microsoft.
Any MCP-compatible AI can call MiniCrit:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Claude / GPT โ โ MiniCrit โ
โ Gemini / etc. โโโโโโโโโโบโ MCP Server โ
โโโโโโโโโโโโโโโโโโโ MCP โโโโโโโโโโโโโโโโโโโ
โ โ
โ Tool: validate_reasoning
โ Input: {rationale, domain}
โ Output: {valid, severity, critique, flags}
โ
โผ
Safer AI Decisions
MCP Tools
| Tool | Description |
|---|---|
validate_reasoning |
Validate AI reasoning, returns critique with severity |
batch_validate |
Validate multiple items efficiently |
get_model_info |
Get model status and configuration |
Supported Domains
trading โข finance โข defense โข cybersecurity โข medical โข risk_assessment โข planning โข general
Output Format
{
"valid": false,
"severity": "high",
"critique": "This reasoning exhibits recency bias. A single day's price movement has no predictive power...",
"confidence": 0.87,
"flags": ["overconfidence", "insufficient_evidence", "missing_consideration"],
"latency_ms": 42.3
}
Severity Levels: pass โ low โ medium โ high โ critical
๐ What MiniCrit Detects
Cognitive Biases
|
Logical Flaws
|
Example
Input:
"AAPL long: The stock has risen 3 days in a row, momentum is clearly bullish. 95% confident this continues."
MiniCrit Output:
โ ๏ธ HIGH SEVERITY - Multiple reasoning flaws detected:
- Overconfidence: 95% confidence is not supported by the evidence provided
- Recency Bias: 3 days of price movement has minimal predictive value
- Missing Risk Factors: No consideration of upcoming earnings, macro events, or sector rotation
Flags: overconfidence, insufficient_evidence, unaddressed_risk
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MiniCrit-7B System โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ AI System โโโโโถโ MiniCrit โโโโโถโ Validated Output โ โ
โ โ (Any LLM) โ โ Server โ โ โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ โ โข valid: bool โ โ
โ โ โ โข severity: enum โ โ
โ โโโโโโโโผโโโโโโโ โ โข critique: string โ โ
โ โ Qwen2-7B โ โ โข flags: list โ โ
โ โ Base โ โ โข confidence: float โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโผโโโโโโโ โ
โ โ LoRA โ 40.4M trainable params โ
โ โ Adapter โ Trained on 11.7M critiques โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Training
Model Specifications
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2-7B-Instruct |
| Method | LoRA (r=16, ฮฑ=32) |
| Trainable Params | 40.4M / 7.6B (0.5%) |
| Dataset | CritiqueBank-11M |
| Hardware | TACC Vista GH200 / Lambda H100 |
| Training Loss | 3.19 โ 0.44 (86% reduction) |
Dataset: CritiqueBank-11M
| Component | Examples |
|---|---|
| LogicFlaw-2.4M | Logical reasoning errors |
| FactCheck-3.2M | Factual accuracy validation |
| BiasDetect-1.8M | Cognitive bias patterns |
| RiskMissing-2.1M | Unaddressed risk factors |
| DomainSpecific-2.2M | Trading, defense, medical |
Published: DOI 10.5281/zenodo.18159342
Training Progress
Loss
3.2 โโโ
2.4 โ โโโโ
1.6 โ โโโโโโ
0.8 โ โโโโโโโโโโโโ
0.4 โ โโโโโโโโโโโโโโโโ โ Current
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0% 25% 50% 75% 100%
Run Training
# TACC Vista (GH200)
sbatch scripts/train_vista.slurm
# Lambda Labs (H100)
python train_minicrit_7b.py --config configs/7b_lora.yaml
๐งช Benchmarking
Compare MiniCrit models head-to-head:
python src/benchmark/benchmark_models.py \
--eval-data data/eval_holdout.jsonl \
--model-1 wmaousley/MiniCrit-1.5B \
--model-2 wmaousley/MiniCrit-7B \
--judge-sample 200 # Optional: LLM-as-judge comparison
Metrics Computed
| Metric | Description |
|---|---|
| False Positive Rate | Valid reasoning incorrectly flagged |
| Detection F1 | Precision/recall on flaw detection |
| Latency (p50/p95/p99) | Inference speed percentiles |
| LLM Judge Score | Claude rates critique quality |
๐ง Advanced: Improve Your Model
Generate Hard Training Examples
# Uses Claude Sonnet (~$30 for 5K examples)
export ANTHROPIC_API_KEY=your-key
python src/training/generate_hard_examples.py --count 5000
Direct Preference Optimization (DPO)
# Generate preference pairs
python src/training/generate_dpo_data.py \
--input eval_holdout.jsonl \
--model wmaousley/MiniCrit-7B \
--output dpo_pairs.jsonl
# Run DPO training
python src/training/train_dpo.py \
--model wmaousley/MiniCrit-7B \
--data dpo_pairs.jsonl \
--output minicrit-7b-dpo
See docs/MODEL_EXCELLENCE_GUIDE.md for the full improvement roadmap.
๐ Repository Structure
MiniCrit-7B/
โโโ src/
โ โโโ mcp/ # MCP Server Implementation
โ โ โโโ server.py # Local stdio (Claude Desktop)
โ โ โโโ server_prod.py # Production HTTP + auth
โ โ โโโ server_http.py # Basic HTTP server
โ โโโ benchmark/ # Model Evaluation
โ โ โโโ benchmark_models.py
โ โโโ training/ # Training Utilities
โ โ โโโ generate_hard_examples.py
โ โ โโโ generate_dpo_data.py
โ โ โโโ train_dpo.py
โ โโโ config.py
โ โโโ data.py
โ โโโ model.py
โ โโโ training.py
โ โโโ evaluation.py
โ โโโ api.py
โโโ docker/
โ โโโ Dockerfile
โ โโโ docker-compose.yml
โ โโโ requirements.txt
โโโ configs/
โ โโโ claude_desktop_config.json
โ โโโ 7b_lora.yaml
โ โโโ deepspeed_gh200.json
โโโ scripts/
โ โโโ train_vista.slurm
โ โโโ vista_setup.sh
โโโ tests/ # 169 tests
โโโ docs/
โ โโโ DEPLOYMENT_GUIDE.md
โ โโโ MODEL_EXCELLENCE_GUIDE.md
โโโ CHANGELOG.md
๐ณ Deployment Options
Docker (Recommended)
cd docker
cp .env.example .env
# Edit .env with your settings
docker-compose up -d
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: minicrit
spec:
replicas: 2
template:
spec:
containers:
- name: minicrit
image: antagoninc/minicrit:latest
resources:
limits:
nvidia.com/gpu: 1
Production HTTP Server
# With authentication & rate limiting
export MINICRIT_API_KEYS="key1,key2,key3"
python src/mcp/server_prod.py
See docs/DEPLOYMENT_GUIDE.md for complete instructions.
๐ฏ Use Cases
| Domain | Application |
|---|---|
| Quantitative Trading | Validate signals before execution |
| Defense / Intelligence | Audit AI threat assessments |
| Medical AI | Review diagnostic reasoning |
| Autonomous Vehicles | Validate planning decisions |
| Enterprise AI | Catch hallucinations before they propagate |
๐ Citation
@software{minicrit2026,
author = {Ousley, William Alexander and Ousley, Jacqueline Villamor},
title = {MiniCrit: Adversarial AI Validation for Autonomous Systems},
year = {2026},
publisher = {Antagon Inc.},
url = {https://github.com/antagoninc/MiniCrit-7B}
}
๐ Acknowledgments
- Lambda Labs - GPU compute grant for H100 training
- TACC Vista - GH200 supercomputing via NAIRR Pilot
- Anthropic - MCP standard development
๐ License
Apache 2.0 - See LICENSE
Antagon Inc.
Making AI Systems Safer Through Adversarial Testing
Website โข Contact โข CAGE: 17E75 โข UEI: KBSGT7CZ4AH3
William Alexander Ousley - Co-Founder & CEO
Jacqueline Villamor Ousley - Co-Founder & CTO (TS/SCI)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minicrit-0.1.0.tar.gz.
File metadata
- Download URL: minicrit-0.1.0.tar.gz
- Upload date:
- Size: 80.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
895a92588e1c506ecf5b3bcce92306ce60d950a30c1b1f7e8ef52c8efa158878
|
|
| MD5 |
7c3459831bd0458028d9da57d627594d
|
|
| BLAKE2b-256 |
cd3e3fe538269f5f8a2103b655a868564c0b6425e03579062104fccd0aa693d4
|
File details
Details for the file minicrit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: minicrit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 87.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5940705b69778c150f47502942722d8a9bf96971f526f7598dac2f72ca42b3c0
|
|
| MD5 |
b02e5c7d7e76f81bdf5c4bdc315cd984
|
|
| BLAKE2b-256 |
170e7aa7b2558d93b8e732c2992225c6dfe06e0690f58f80daa3928d42995dc6
|