Skip to main content

Adversarial AI validation for autonomous systems - catch reasoning flaws before they become failures

Project description

MiniCrit-7B: Adversarial AI Validation for Autonomous Systems

MiniCrit Logo

๐Ÿ›ก๏ธ Catch AI reasoning flaws before they become failures

PyPI HuggingFace Tests License MCP

Quick Start โ€ข MCP Integration โ€ข API โ€ข Training โ€ข Benchmarks


๐ŸŽฏ The Problem

Autonomous AI systems fail silently. They produce confident-sounding outputs with hidden flawsโ€”overconfidence, missing risks, logical errors, hallucinations. By the time you notice, it's too late.

Traditional testing catches bugs. MiniCrit catches bad reasoning.

๐Ÿ’ก The Solution

MiniCrit is a specialized AI "devil's advocate" that validates reasoning before actions are taken. It integrates with any AI system via MCP (Model Context Protocol) to provide real-time adversarial critique.

Your AI Agent โ†’ MiniCrit Validation โ†’ Safer Decisions
     โ†“                   โ†“                    โ†“
  "Buy AAPL,         "Overconfidence:      Execute with
   95% confident"     only 2 data points,   reduced size
                      missing earnings       or skip
                      risk"

๐Ÿ“Š Results

35%

Flawed Output Reduction

+0.28

Sharpe Ratio Improvement

38,000+

Live Validations

<50ms

Inference Latency
Metric MiniCrit-7B GPT-4 Claude-3
Flaw Detection F1 0.82 0.75 0.78
False Positive Rate 12% 18% 15%
Latency 45ms 850ms 620ms
Cost per 1K calls $0.00 $30 $15

๐Ÿš€ Quick Start

Option 1: pip install (Recommended)

pip install minicrit
from minicrit import MiniCrit

# Initialize (downloads model automatically)
critic = MiniCrit()

# Validate reasoning
result = critic.validate(
    "Stock will rise because it rose yesterday",
    domain="trading"
)

print(result.valid)      # False
print(result.severity)   # "high" 
print(result.critique)   # "This reasoning exhibits recency bias..."
print(result.flags)      # ["overconfidence", "insufficient_evidence"]

Option 2: CLI

# Single validation
minicrit "Buy AAPL, 95% confident" --domain trading

# From file
minicrit --file rationales.txt --output results.json

# JSON output
minicrit "MACD crossover signals buy" --json

Installation Extras

# Core only
pip install minicrit

# With MCP server support
pip install minicrit[mcp]

# With training utilities
pip install minicrit[training]

# Everything
pip install minicrit[all]

Option 3: Docker

cd docker
docker-compose up -d
curl http://localhost:8000/critique \
  -H "Content-Type: application/json" \
  -d '{"rationale": "Buy signal based on MACD crossover", "domain": "trading"}'

Option 4: MCP (Claude Desktop / Claude Code)

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "minicrit": {
      "command": "python3",
      "args": ["/path/to/MiniCrit-7B/src/mcp/server.py"],
      "env": {
        "MINICRIT_ADAPTER": "wmaousley/MiniCrit-7B",
        "MINICRIT_BASE_MODEL": "Qwen/Qwen2-7B-Instruct"
      }
    }
  }
}

Then in Claude: "Use validate_reasoning to check: Buy AAPL, RSI shows oversold"


๐Ÿ”Œ MCP Integration

MiniCrit implements the Model Context Protocol (MCP)โ€”the industry standard for AI tool integration, backed by Anthropic, OpenAI, Google, and Microsoft.

Any MCP-compatible AI can call MiniCrit:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Claude / GPT   โ”‚         โ”‚    MiniCrit     โ”‚
โ”‚  Gemini / etc.  โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚   MCP Server    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   MCP   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                           โ”‚
         โ”‚    Tool: validate_reasoning
         โ”‚    Input: {rationale, domain}
         โ”‚    Output: {valid, severity, critique, flags}
         โ”‚
         โ–ผ
   Safer AI Decisions

MCP Tools

Tool Description
validate_reasoning Validate AI reasoning, returns critique with severity
batch_validate Validate multiple items efficiently
get_model_info Get model status and configuration

Supported Domains

trading โ€ข finance โ€ข defense โ€ข cybersecurity โ€ข medical โ€ข risk_assessment โ€ข planning โ€ข general

Output Format

{
  "valid": false,
  "severity": "high",
  "critique": "This reasoning exhibits recency bias. A single day's price movement has no predictive power...",
  "confidence": 0.87,
  "flags": ["overconfidence", "insufficient_evidence", "missing_consideration"],
  "latency_ms": 42.3
}

Severity Levels: pass โ†’ low โ†’ medium โ†’ high โ†’ critical


๐Ÿ” What MiniCrit Detects

Cognitive Biases

  • โš ๏ธ Overconfidence - Certainty without evidence
  • โš ๏ธ Survivorship Bias - Ignoring failures
  • โš ๏ธ Confirmation Bias - Cherry-picking data
  • โš ๏ธ Anchoring - Over-relying on first info
  • โš ๏ธ Recency Bias - Overweighting recent events

Logical Flaws

  • ๐Ÿšซ False Causation - Correlation โ‰  causation
  • ๐Ÿšซ Hasty Generalization - Small sample size
  • ๐Ÿšซ Missing Risks - Unaddressed threats
  • ๐Ÿšซ Circular Reasoning - Assuming the conclusion
  • ๐Ÿšซ False Dichotomy - Ignoring options

Example

Input:

"AAPL long: The stock has risen 3 days in a row, momentum is clearly bullish. 95% confident this continues."

MiniCrit Output:

โš ๏ธ HIGH SEVERITY - Multiple reasoning flaws detected:

  1. Overconfidence: 95% confidence is not supported by the evidence provided
  2. Recency Bias: 3 days of price movement has minimal predictive value
  3. Missing Risk Factors: No consideration of upcoming earnings, macro events, or sector rotation

Flags: overconfidence, insufficient_evidence, unaddressed_risk


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      MiniCrit-7B System                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ AI System  โ”‚โ”€โ”€โ”€โ–ถโ”‚   MiniCrit  โ”‚โ”€โ”€โ”€โ–ถโ”‚   Validated Output    โ”‚ โ”‚
โ”‚  โ”‚ (Any LLM)  โ”‚    โ”‚   Server    โ”‚    โ”‚                       โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚ โ€ข valid: bool         โ”‚ โ”‚
โ”‚                           โ”‚           โ”‚ โ€ข severity: enum      โ”‚ โ”‚
โ”‚                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚ โ€ข critique: string    โ”‚ โ”‚
โ”‚                    โ”‚  Qwen2-7B   โ”‚    โ”‚ โ€ข flags: list         โ”‚ โ”‚
โ”‚                    โ”‚    Base     โ”‚    โ”‚ โ€ข confidence: float   โ”‚ โ”‚
โ”‚                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                           โ”‚                                      โ”‚
โ”‚                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”                              โ”‚
โ”‚                    โ”‚   LoRA      โ”‚  40.4M trainable params      โ”‚
โ”‚                    โ”‚  Adapter    โ”‚  Trained on 11.7M critiques  โ”‚
โ”‚                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                              โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ˆ Training

Model Specifications

Parameter Value
Base Model Qwen/Qwen2-7B-Instruct
Method LoRA (r=16, ฮฑ=32)
Trainable Params 40.4M / 7.6B (0.5%)
Dataset CritiqueBank-11M
Hardware TACC Vista GH200 / Lambda H100
Training Loss 3.19 โ†’ 0.44 (86% reduction)

Dataset: CritiqueBank-11M

Component Examples
LogicFlaw-2.4M Logical reasoning errors
FactCheck-3.2M Factual accuracy validation
BiasDetect-1.8M Cognitive bias patterns
RiskMissing-2.1M Unaddressed risk factors
DomainSpecific-2.2M Trading, defense, medical

Published: DOI 10.5281/zenodo.18159342

Training Progress

Loss
3.2 โ”‚โ–ˆโ–ˆ
2.4 โ”‚  โ–ˆโ–ˆโ–ˆโ–ˆ
1.6 โ”‚      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
0.8 โ”‚            โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
0.4 โ”‚                        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ† Current
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    0%        25%        50%        75%     100%

Run Training

# TACC Vista (GH200)
sbatch scripts/train_vista.slurm

# Lambda Labs (H100)
python train_minicrit_7b.py --config configs/7b_lora.yaml

๐Ÿงช Benchmarking

Compare MiniCrit models head-to-head:

python src/benchmark/benchmark_models.py \
  --eval-data data/eval_holdout.jsonl \
  --model-1 wmaousley/MiniCrit-1.5B \
  --model-2 wmaousley/MiniCrit-7B \
  --judge-sample 200  # Optional: LLM-as-judge comparison

Metrics Computed

Metric Description
False Positive Rate Valid reasoning incorrectly flagged
Detection F1 Precision/recall on flaw detection
Latency (p50/p95/p99) Inference speed percentiles
LLM Judge Score Claude rates critique quality

๐Ÿ”ง Advanced: Improve Your Model

Generate Hard Training Examples

# Uses Claude Sonnet (~$30 for 5K examples)
export ANTHROPIC_API_KEY=your-key
python src/training/generate_hard_examples.py --count 5000

Direct Preference Optimization (DPO)

# Generate preference pairs
python src/training/generate_dpo_data.py \
  --input eval_holdout.jsonl \
  --model wmaousley/MiniCrit-7B \
  --output dpo_pairs.jsonl

# Run DPO training
python src/training/train_dpo.py \
  --model wmaousley/MiniCrit-7B \
  --data dpo_pairs.jsonl \
  --output minicrit-7b-dpo

See docs/MODEL_EXCELLENCE_GUIDE.md for the full improvement roadmap.


๐Ÿ“ Repository Structure

MiniCrit-7B/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ mcp/                    # MCP Server Implementation
โ”‚   โ”‚   โ”œโ”€โ”€ server.py           # Local stdio (Claude Desktop)
โ”‚   โ”‚   โ”œโ”€โ”€ server_prod.py      # Production HTTP + auth
โ”‚   โ”‚   โ””โ”€โ”€ server_http.py      # Basic HTTP server
โ”‚   โ”œโ”€โ”€ benchmark/              # Model Evaluation
โ”‚   โ”‚   โ””โ”€โ”€ benchmark_models.py
โ”‚   โ”œโ”€โ”€ training/               # Training Utilities
โ”‚   โ”‚   โ”œโ”€โ”€ generate_hard_examples.py
โ”‚   โ”‚   โ”œโ”€โ”€ generate_dpo_data.py
โ”‚   โ”‚   โ””โ”€โ”€ train_dpo.py
โ”‚   โ”œโ”€โ”€ config.py
โ”‚   โ”œโ”€โ”€ data.py
โ”‚   โ”œโ”€โ”€ model.py
โ”‚   โ”œโ”€โ”€ training.py
โ”‚   โ”œโ”€โ”€ evaluation.py
โ”‚   โ””โ”€โ”€ api.py
โ”œโ”€โ”€ docker/
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ”œโ”€โ”€ docker-compose.yml
โ”‚   โ””โ”€โ”€ requirements.txt
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ claude_desktop_config.json
โ”‚   โ”œโ”€โ”€ 7b_lora.yaml
โ”‚   โ””โ”€โ”€ deepspeed_gh200.json
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ train_vista.slurm
โ”‚   โ””โ”€โ”€ vista_setup.sh
โ”œโ”€โ”€ tests/                      # 169 tests
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ DEPLOYMENT_GUIDE.md
โ”‚   โ””โ”€โ”€ MODEL_EXCELLENCE_GUIDE.md
โ””โ”€โ”€ CHANGELOG.md

๐Ÿณ Deployment Options

Docker (Recommended)

cd docker
cp .env.example .env
# Edit .env with your settings
docker-compose up -d

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: minicrit
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: minicrit
        image: antagoninc/minicrit:latest
        resources:
          limits:
            nvidia.com/gpu: 1

Production HTTP Server

# With authentication & rate limiting
export MINICRIT_API_KEYS="key1,key2,key3"
python src/mcp/server_prod.py

See docs/DEPLOYMENT_GUIDE.md for complete instructions.


๐ŸŽฏ Use Cases

Domain Application
Quantitative Trading Validate signals before execution
Defense / Intelligence Audit AI threat assessments
Medical AI Review diagnostic reasoning
Autonomous Vehicles Validate planning decisions
Enterprise AI Catch hallucinations before they propagate

๐Ÿ“œ Citation

@software{minicrit2026,
  author = {Ousley, William Alexander and Ousley, Jacqueline Villamor},
  title = {MiniCrit: Adversarial AI Validation for Autonomous Systems},
  year = {2026},
  publisher = {Antagon Inc.},
  url = {https://github.com/antagoninc/MiniCrit-7B}
}

๐Ÿ™ Acknowledgments

Lambda Labs TACC NAIRR

  • Lambda Labs - GPU compute grant for H100 training
  • TACC Vista - GH200 supercomputing via NAIRR Pilot
  • Anthropic - MCP standard development

๐Ÿ“„ License

Apache 2.0 - See LICENSE


Antagon Inc.
Making AI Systems Safer Through Adversarial Testing

Website โ€ข Contact โ€ข CAGE: 17E75 โ€ข UEI: KBSGT7CZ4AH3

William Alexander Ousley - Co-Founder & CEO
Jacqueline Villamor Ousley - Co-Founder & CTO (TS/SCI)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minicrit-0.1.0.tar.gz (80.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minicrit-0.1.0-py3-none-any.whl (87.6 kB view details)

Uploaded Python 3

File details

Details for the file minicrit-0.1.0.tar.gz.

File metadata

  • Download URL: minicrit-0.1.0.tar.gz
  • Upload date:
  • Size: 80.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for minicrit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 895a92588e1c506ecf5b3bcce92306ce60d950a30c1b1f7e8ef52c8efa158878
MD5 7c3459831bd0458028d9da57d627594d
BLAKE2b-256 cd3e3fe538269f5f8a2103b655a868564c0b6425e03579062104fccd0aa693d4

See more details on using hashes here.

File details

Details for the file minicrit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: minicrit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 87.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for minicrit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5940705b69778c150f47502942722d8a9bf96971f526f7598dac2f72ca42b3c0
MD5 b02e5c7d7e76f81bdf5c4bdc315cd984
BLAKE2b-256 170e7aa7b2558d93b8e732c2992225c6dfe06e0690f58f80daa3928d42995dc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page