Adversarial AI validation for autonomous systems - catch reasoning flaws before they become failures

These details have not been verified by PyPI

Project links

Project description

MiniCrit-7B: Adversarial AI Validation for Autonomous Systems

MiniCrit Logo

🛡️ Catch AI reasoning flaws before they become failures

Quick Start • MCP Integration • API • Training • Benchmarks

🎯 The Problem

Autonomous AI systems fail silently. They produce confident-sounding outputs with hidden flaws—overconfidence, missing risks, logical errors, hallucinations. By the time you notice, it's too late.

Traditional testing catches bugs. MiniCrit catches bad reasoning.

💡 The Solution

MiniCrit is a specialized AI "devil's advocate" that validates reasoning before actions are taken. It integrates with any AI system via MCP (Model Context Protocol) to provide real-time adversarial critique.

Your AI Agent → MiniCrit Validation → Safer Decisions
     ↓                   ↓                    ↓
  "Buy AAPL,         "Overconfidence:      Execute with
   95% confident"     only 2 data points,   reduced size
                      missing earnings       or skip
                      risk"

📊 Results

35%

_{Flawed Output Reduction}

+0.28

_{Sharpe Ratio Improvement}

38,000+

_{Live Validations}

<50ms

_{Inference Latency}

Metric	MiniCrit-7B	GPT-4	Claude-3
Flaw Detection F1	0.82	0.75	0.78
False Positive Rate	12%	18%	15%
Latency	45ms	850ms	620ms
Cost per 1K calls	$0.00	$30	$15

🚀 Quick Start

Option 1: pip install (Recommended)

pip install minicrit

from minicrit import MiniCrit

# Initialize (downloads model automatically)
critic = MiniCrit()

# Validate reasoning
result = critic.validate(
    "Stock will rise because it rose yesterday",
    domain="trading"
)

print(result.valid)      # False
print(result.severity)   # "high" 
print(result.critique)   # "This reasoning exhibits recency bias..."
print(result.flags)      # ["overconfidence", "insufficient_evidence"]

Option 2: CLI

# Single validation
minicrit "Buy AAPL, 95% confident" --domain trading

# From file
minicrit --file rationales.txt --output results.json

# JSON output
minicrit "MACD crossover signals buy" --json

Installation Extras

# Core only
pip install minicrit

# With MCP server support
pip install minicrit[mcp]

# With training utilities
pip install minicrit[training]

# Everything
pip install minicrit[all]

Option 3: Docker

cd docker
docker-compose up -d
curl http://localhost:8000/critique \
  -H "Content-Type: application/json" \
  -d '{"rationale": "Buy signal based on MACD crossover", "domain": "trading"}'

Option 4: MCP (Claude Desktop / Claude Code)

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "minicrit": {
      "command": "python3",
      "args": ["/path/to/MiniCrit-7B/src/mcp/server.py"],
      "env": {
        "MINICRIT_ADAPTER": "wmaousley/MiniCrit-7B",
        "MINICRIT_BASE_MODEL": "Qwen/Qwen2-7B-Instruct"
      }
    }
  }
}

Then in Claude: "Use validate_reasoning to check: Buy AAPL, RSI shows oversold"

🔌 MCP Integration

MiniCrit implements the Model Context Protocol (MCP)—the industry standard for AI tool integration, backed by Anthropic, OpenAI, Google, and Microsoft.

Any MCP-compatible AI can call MiniCrit:

┌─────────────────┐         ┌─────────────────┐
│  Claude / GPT   │         │    MiniCrit     │
│  Gemini / etc.  │◄───────►│   MCP Server    │
└─────────────────┘   MCP   └─────────────────┘
         │                           │
         │    Tool: validate_reasoning
         │    Input: {rationale, domain}
         │    Output: {valid, severity, critique, flags}
         │
         ▼
   Safer AI Decisions

MCP Tools

Tool	Description
`validate_reasoning`	Validate AI reasoning, returns critique with severity
`batch_validate`	Validate multiple items efficiently
`get_model_info`	Get model status and configuration

Supported Domains

trading • finance • defense • cybersecurity • medical • risk_assessment • planning • general

Output Format

{
  "valid": false,
  "severity": "high",
  "critique": "This reasoning exhibits recency bias. A single day's price movement has no predictive power...",
  "confidence": 0.87,
  "flags": ["overconfidence", "insufficient_evidence", "missing_consideration"],
  "latency_ms": 42.3
}

Severity Levels: pass → low → medium → high → critical

🔍 What MiniCrit Detects

Cognitive Biases

⚠️ Overconfidence - Certainty without evidence
⚠️ Survivorship Bias - Ignoring failures
⚠️ Confirmation Bias - Cherry-picking data
⚠️ Anchoring - Over-relying on first info
⚠️ Recency Bias - Overweighting recent events

Logical Flaws

🚫 False Causation - Correlation ≠ causation
🚫 Hasty Generalization - Small sample size
🚫 Missing Risks - Unaddressed threats
🚫 Circular Reasoning - Assuming the conclusion
🚫 False Dichotomy - Ignoring options

Example

Input:

"AAPL long: The stock has risen 3 days in a row, momentum is clearly bullish. 95% confident this continues."

MiniCrit Output:

⚠️ HIGH SEVERITY - Multiple reasoning flaws detected:

Overconfidence: 95% confidence is not supported by the evidence provided

Recency Bias: 3 days of price movement has minimal predictive value

Missing Risk Factors: No consideration of upcoming earnings, macro events, or sector rotation

Flags: overconfidence, insufficient_evidence, unaddressed_risk

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────┐
│                      MiniCrit-7B System                          │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌────────────┐    ┌─────────────┐    ┌───────────────────────┐ │
│  │ AI System  │───▶│   MiniCrit  │───▶│   Validated Output    │ │
│  │ (Any LLM)  │    │   Server    │    │                       │ │
│  └────────────┘    └──────┬──────┘    │ • valid: bool         │ │
│                           │           │ • severity: enum      │ │
│                    ┌──────▼──────┐    │ • critique: string    │ │
│                    │  Qwen2-7B   │    │ • flags: list         │ │
│                    │    Base     │    │ • confidence: float   │ │
│                    └──────┬──────┘    └───────────────────────┘ │
│                           │                                      │
│                    ┌──────▼──────┐                              │
│                    │   LoRA      │  40.4M trainable params      │
│                    │  Adapter    │  Trained on 11.7M critiques  │
│                    └─────────────┘                              │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

📈 Training

Model Specifications

Parameter	Value
Base Model	`Qwen/Qwen2-7B-Instruct`
Method	LoRA (r=16, α=32)
Trainable Params	40.4M / 7.6B (0.5%)
Dataset	CritiqueBank-11M
Hardware	TACC Vista GH200 / Lambda H100
Training Loss	3.19 → 0.44 (86% reduction)

Dataset: CritiqueBank-11M

Component	Examples
LogicFlaw-2.4M	Logical reasoning errors
FactCheck-3.2M	Factual accuracy validation
BiasDetect-1.8M	Cognitive bias patterns
RiskMissing-2.1M	Unaddressed risk factors
DomainSpecific-2.2M	Trading, defense, medical

Published: DOI 10.5281/zenodo.18159342

Training Progress

Loss
3.2 │██
2.4 │  ████
1.6 │      ██████
0.8 │            ████████████
0.4 │                        ████████████████  ← Current
    └─────────────────────────────────────────
    0%        25%        50%        75%     100%

Run Training

# TACC Vista (GH200)
sbatch scripts/train_vista.slurm

# Lambda Labs (H100)
python train_minicrit_7b.py --config configs/7b_lora.yaml

🧪 Benchmarking

Compare MiniCrit models head-to-head:

python src/benchmark/benchmark_models.py \
  --eval-data data/eval_holdout.jsonl \
  --model-1 wmaousley/MiniCrit-1.5B \
  --model-2 wmaousley/MiniCrit-7B \
  --judge-sample 200  # Optional: LLM-as-judge comparison

Metrics Computed

Metric	Description
False Positive Rate	Valid reasoning incorrectly flagged
Detection F1	Precision/recall on flaw detection
Latency (p50/p95/p99)	Inference speed percentiles
LLM Judge Score	Claude rates critique quality

🔧 Advanced: Improve Your Model

Generate Hard Training Examples

# Uses Claude Sonnet (~$30 for 5K examples)
export ANTHROPIC_API_KEY=your-key
python src/training/generate_hard_examples.py --count 5000

Direct Preference Optimization (DPO)

# Generate preference pairs
python src/training/generate_dpo_data.py \
  --input eval_holdout.jsonl \
  --model wmaousley/MiniCrit-7B \
  --output dpo_pairs.jsonl

# Run DPO training
python src/training/train_dpo.py \
  --model wmaousley/MiniCrit-7B \
  --data dpo_pairs.jsonl \
  --output minicrit-7b-dpo

See docs/MODEL_EXCELLENCE_GUIDE.md for the full improvement roadmap.

📁 Repository Structure

MiniCrit-7B/
├── src/
│   ├── mcp/                    # MCP Server Implementation
│   │   ├── server.py           # Local stdio (Claude Desktop)
│   │   ├── server_prod.py      # Production HTTP + auth
│   │   └── server_http.py      # Basic HTTP server
│   ├── benchmark/              # Model Evaluation
│   │   └── benchmark_models.py
│   ├── training/               # Training Utilities
│   │   ├── generate_hard_examples.py
│   │   ├── generate_dpo_data.py
│   │   └── train_dpo.py
│   ├── config.py
│   ├── data.py
│   ├── model.py
│   ├── training.py
│   ├── evaluation.py
│   └── api.py
├── docker/
│   ├── Dockerfile
│   ├── docker-compose.yml
│   └── requirements.txt
├── configs/
│   ├── claude_desktop_config.json
│   ├── 7b_lora.yaml
│   └── deepspeed_gh200.json
├── scripts/
│   ├── train_vista.slurm
│   └── vista_setup.sh
├── tests/                      # 169 tests
├── docs/
│   ├── DEPLOYMENT_GUIDE.md
│   └── MODEL_EXCELLENCE_GUIDE.md
└── CHANGELOG.md

🐳 Deployment Options

Docker (Recommended)

cd docker
cp .env.example .env
# Edit .env with your settings
docker-compose up -d

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: minicrit
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: minicrit
        image: antagoninc/minicrit:latest
        resources:
          limits:
            nvidia.com/gpu: 1

Production HTTP Server

# With authentication & rate limiting
export MINICRIT_API_KEYS="key1,key2,key3"
python src/mcp/server_prod.py

See docs/DEPLOYMENT_GUIDE.md for complete instructions.

🎯 Use Cases

Domain	Application
Quantitative Trading	Validate signals before execution
Defense / Intelligence	Audit AI threat assessments
Medical AI	Review diagnostic reasoning
Autonomous Vehicles	Validate planning decisions
Enterprise AI	Catch hallucinations before they propagate

📜 Citation

@software{minicrit2026,
  author = {Ousley, William Alexander and Ousley, Jacqueline Villamor},
  title = {MiniCrit: Adversarial AI Validation for Autonomous Systems},
  year = {2026},
  publisher = {Antagon Inc.},
  url = {https://github.com/antagoninc/MiniCrit-7B}
}

🙏 Acknowledgments

Lambda Labs - GPU compute grant for H100 training
TACC Vista - GH200 supercomputing via NAIRR Pilot
Anthropic - MCP standard development

📄 License

Apache 2.0 - See LICENSE

Antagon Inc.
Making AI Systems Safer Through Adversarial Testing

Website • Contact • CAGE: 17E75 • UEI: KBSGT7CZ4AH3

William Alexander Ousley - Co-Founder & CEO
Jacqueline Villamor Ousley - Co-Founder & CTO (TS/SCI)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Jan 12, 2026

This version

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minicrit-0.1.0.tar.gz (80.2 kB view details)

Uploaded Jan 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

minicrit-0.1.0-py3-none-any.whl (87.6 kB view details)

Uploaded Jan 12, 2026 Python 3

File details

Details for the file minicrit-0.1.0.tar.gz.

File metadata

Download URL: minicrit-0.1.0.tar.gz
Upload date: Jan 12, 2026
Size: 80.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for minicrit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`895a92588e1c506ecf5b3bcce92306ce60d950a30c1b1f7e8ef52c8efa158878`
MD5	`7c3459831bd0458028d9da57d627594d`
BLAKE2b-256	`cd3e3fe538269f5f8a2103b655a868564c0b6425e03579062104fccd0aa693d4`

See more details on using hashes here.

File details

Details for the file minicrit-0.1.0-py3-none-any.whl.

File metadata

Download URL: minicrit-0.1.0-py3-none-any.whl
Upload date: Jan 12, 2026
Size: 87.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for minicrit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5940705b69778c150f47502942722d8a9bf96971f526f7598dac2f72ca42b3c0`
MD5	`b02e5c7d7e76f81bdf5c4bdc315cd984`
BLAKE2b-256	`170e7aa7b2558d93b8e732c2992225c6dfe06e0690f58f80daa3928d42995dc6`

See more details on using hashes here.

minicrit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MiniCrit-7B: Adversarial AI Validation for Autonomous Systems

🎯 The Problem

💡 The Solution

📊 Results

35%

+0.28

38,000+

<50ms

🚀 Quick Start

Option 1: pip install (Recommended)

Option 2: CLI

Installation Extras

Option 3: Docker

Option 4: MCP (Claude Desktop / Claude Code)

🔌 MCP Integration

MCP Tools

Supported Domains

Output Format

🔍 What MiniCrit Detects

Cognitive Biases

Logical Flaws

Example

🏗️ Architecture

📈 Training

Model Specifications

Dataset: CritiqueBank-11M

Training Progress

Run Training

🧪 Benchmarking

Metrics Computed

🔧 Advanced: Improve Your Model

Generate Hard Training Examples

Direct Preference Optimization (DPO)

📁 Repository Structure

🐳 Deployment Options

Docker (Recommended)

Kubernetes

Production HTTP Server

🎯 Use Cases

📜 Citation

🙏 Acknowledgments

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes