Pre-ship risk critic (CLI + Python library) — surfaces breaking risk scenarios before they reach production

These details have not been verified by PyPI

Project links

Project description

Gremlin

AI critic for your codebase — surfaces breaking risk scenarios before they reach production

What is Gremlin?

Gremlin is a pre-ship risk critic (CLI + Python library) that answers: "What could break?"

Feed it a feature spec, PR diff, or plain English — Gremlin critiques it for blind spots using:

107 curated risk patterns across 14 domains (payments, auth, infra, serialization, distributed systems, and more)
LLM reasoning (applies patterns intelligently to your specific context)
Structured output (severity-ranked risk scenarios with confidence scores)

Installation

# Install from PyPI
pip install gremlin-critic

# Set your Anthropic API key
export ANTHROPIC_API_KEY=sk-ant-...

For development: git clone https://github.com/abhi10/gremlin.git && pip install -e ".[dev]"

Quick Start

CLI Usage

# Review a feature for risks
gremlin review "checkout flow with Stripe integration"

# Deep analysis with lower confidence threshold
gremlin review "auth system" --depth deep --threshold 60

# See available patterns
gremlin patterns list

# Show patterns for a specific domain
gremlin patterns show payments

Programmatic API (New in v0.2.0)

from gremlin import Gremlin

# Basic usage
gremlin = Gremlin()
result = gremlin.analyze("user authentication")

# Check for critical risks
if result.has_critical_risks():
    print(f"Found {result.critical_count} critical risks!")
    for risk in result.risks:
        print(f"- [{risk.severity}] {risk.scenario}")

# Multiple output formats
json_output = result.to_json()       # JSON string
junit_xml = result.to_junit()        # JUnit XML for CI
llm_format = result.format_for_llm() # Concise format for agents

# Async support for agent frameworks
result = await gremlin.analyze_async("payment processing")

# With additional context
result = gremlin.analyze(
    scope="checkout flow",
    context="Using Stripe API with webhook handling",
    depth="deep"
)

See the API documentation below for detailed usage.

Example Output

┌─────────────────────────────────────────────────────────────────────────────┐
│ Risk Scenarios for: checkout flow                                           │
└─────────────────────────────────────────────────────────────────────────────┘

🔴 CRITICAL (95% confidence)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Webhook Race Condition

  What if the Stripe webhook arrives before the order record is committed?

  Impact: Payment captured but order not created. Customer charged without record.
  Domain: payments


🟠 HIGH (87% confidence)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Double Submit on Payment Button

  What if the user clicks "Pay Now" twice rapidly?

  Impact: Potential duplicate charges.
  Domain: payments, concurrency

Risk Dashboard

Interactive visualization of Gremlin analysis results — live at abhi10.github.io/gremlin

Features:

Heatmap visualization — severity distribution across feature areas (CRITICAL / HIGH / MEDIUM / LOW)
Severity donut chart — at-a-glance risk breakdown
Domain bar chart — risk count per domain (concurrency, auth, payments...)
Interactive risk table — sortable, filterable, expandable rows with full scenario + impact
Multi-project — includes scans of celery, pydantic, openclaw

Commands

Command	Description
`gremlin review "scope"`	Analyze a feature for QA risks
`gremlin patterns list`	Show all available pattern categories
`gremlin patterns show <domain>`	Show patterns for a specific domain

Options for `review`

Option	Default	Description
`--depth`	`quick`	Analysis depth: `quick` or `deep`
`--threshold`	`80`	Confidence filter (0-100)
`--output`	`rich`	Output format: `rich`, `md`, `json`
`--patterns`	-	Custom patterns file (YAML)
`--context`	-	Additional context: string, `@file`, or `-` for stdin
`--validate`	`false`	Run second pass to filter hallucinations

Custom Patterns

Add domain-specific patterns for your codebase:

Project-level (auto-loaded)

# .gremlin/patterns.yaml
domain_specific:
  image_processing:
    keywords: [image, photo, upload, resize, cdn]
    patterns:
      - "What if EXIF rotation is ignored during resize?"
      - "What if CDN cache serves stale image after update?"

Via `--patterns` flag

gremlin review "image upload" --patterns @my-patterns.yaml

Learn from incidents

gremlin learn "Portrait images displayed sideways" --domain files --source prod-incident

See docs/CUSTOM_PATTERNS.md for the full authoring guide.

Pattern Domains

Gremlin includes curated patterns for these domains:

auth - Authentication, sessions, tokens
payments - Checkout, billing, refunds
file_upload - File handling, validation
database - Queries, transactions, migrations
api - Rate limiting, endpoints
deployment - Config, containers, environments
infrastructure - Servers, certs, resources
And more...

How It Works

User: gremlin review "checkout flow"
         │
         ▼
    ┌─────────────┐
    │ Parse scope │
    └──────┬──────┘
           │
           ▼
    ┌─────────────────┐
    │ Infer domains   │  "checkout" → payments
    └────────┬────────┘
             │
             ▼
    ┌─────────────────┐
    │ Select patterns │  universal + payments
    └────────┬────────┘
             │
             ▼
    ┌─────────────────┐
    │ Build prompt    │  system.md + patterns + scope
    └────────┬────────┘
             │
             ▼
    ┌─────────────────┐
    │ Call Claude API │
    └────────┬────────┘
             │
             ▼
    ┌─────────────────┐
    │ Render output   │  Risk scenarios
    └─────────────────┘

Performance

Gremlin's pattern-based approach achieves 90.7% tie rate with baseline Claude Sonnet 4 across 54 real-world test cases:

Metric	Result	Notes
Tie Rate	90.7%	Gremlin matches baseline Claude quality
Win/Tie Rate	98.1%	Combined wins + ties
Gremlin Wins	7.4%	Cases where patterns provide unique value
Claude Wins	1.9%	Minor category labeling differences
Pattern Count	107	Universal + domain-specific patterns

Key Achievement: 90% reduction in quality gaps (19% → 1.9%) through strategic pattern improvements.

See Phase 2 Tier 1 Results for detailed analysis.

Claude Code Integration

Gremlin also provides a Claude Code agent for code-focused risk critique during PR reviews. See docs/INTEGRATION_GUIDE.md for setup.

Programmatic API

Gremlin can be used as a Python library for integration with CI/CD pipelines, agent frameworks, and custom tools.

Installation

pip install gremlin-critic
# Or for development:
pip install -e ".[dev]"

Basic Usage

from gremlin import Gremlin, Risk, AnalysisResult

# Initialize analyzer
gremlin = Gremlin()

# Analyze a scope
result = gremlin.analyze("checkout flow")

# Access results
print(f"Found {len(result.risks)} risks")
print(f"Matched domains: {result.matched_domains}")
print(f"Pattern count: {result.pattern_count}")

Configuration

# Use different provider/model
gremlin = Gremlin(
    provider="anthropic",           # anthropic, openai, ollama
    model="claude-sonnet-4-20250514",
    threshold=80                     # Confidence threshold
)

# Analyze with context
result = gremlin.analyze(
    scope="user authentication",
    context="Using JWT with Redis session store",
    depth="deep"                     # quick or deep
)

Output Formats

# Dictionary (for JSON APIs)
data = result.to_dict()

# JSON string
json_str = result.to_json()

# JUnit XML (for CI/CD integration)
junit_xml = result.to_junit()

# LLM-friendly format (for agent consumption)
agent_input = result.format_for_llm()

Risk Analysis

# Check risk severity
if result.has_critical_risks():
    print(f"⚠️  {result.critical_count} critical risks found")

if result.has_high_severity_risks():
    print(f"Found {result.high_count} high + {result.critical_count} critical")

# Iterate through risks
for risk in result.risks:
    print(f"[{risk.severity}] ({risk.confidence}%)")
    print(f"  Scenario: {risk.scenario}")
    print(f"  Impact: {risk.impact}")
    print(f"  Domains: {', '.join(risk.domains)}")

Async Support

import asyncio
from gremlin import Gremlin

async def analyze_features():
    gremlin = Gremlin()

    # Run multiple analyses concurrently
    results = await asyncio.gather(
        gremlin.analyze_async("checkout flow"),
        gremlin.analyze_async("user authentication"),
        gremlin.analyze_async("file upload")
    )

    for result in results:
        print(f"{result.scope}: {len(result.risks)} risks")

asyncio.run(analyze_features())

Use Cases

1. LLM Agent Tool

from gremlin import Gremlin

def analyze_code_risks(code: str, feature: str) -> str:
    """Tool for LLM agents to analyze code risks."""
    gremlin = Gremlin()
    result = gremlin.analyze(scope=feature, context=code)
    return result.format_for_llm()

# Use with LangChain, CrewAI, AutoGen, etc.

2. CI/CD Integration

from gremlin import Gremlin
import sys

gremlin = Gremlin(threshold=70)
result = gremlin.analyze("PR changes", context=diff_content)

# Output JUnit XML
with open("gremlin-results.xml", "w") as f:
    f.write(result.to_junit())

# Exit with error if critical risks found
if result.has_critical_risks():
    print(f"❌ Found {result.critical_count} critical risks")
    sys.exit(1)

3. Custom Validation Pipeline

from gremlin import Gremlin

def validate_feature_design(prd: str, feature_name: str) -> dict:
    """Validate a feature design for risks."""
    gremlin = Gremlin(depth="deep")
    result = gremlin.analyze(feature_name, context=prd)

    return {
        "feature": feature_name,
        "risk_count": len(result.risks),
        "critical": result.critical_count,
        "high": result.high_count,
        "requires_review": result.has_high_severity_risks(),
        "report": result.to_dict()
    }

API Reference

Classes:

Gremlin - Main analyzer class
Risk - Individual risk finding with severity, confidence, scenario, impact
AnalysisResult - Complete analysis with multiple output formats

Methods:

Gremlin.analyze(scope, context, depth) - Synchronous analysis
Gremlin.analyze_async(scope, context, depth) - Async analysis
AnalysisResult.to_dict() - Dictionary serialization
AnalysisResult.to_json() - JSON string
AnalysisResult.to_junit() - JUnit XML
AnalysisResult.format_for_llm() - LLM-friendly format
AnalysisResult.has_critical_risks() - Check for critical risks
AnalysisResult.has_high_severity_risks() - Check for high+ risks

Development

# Clone the repo
git clone https://github.com/abhi10/gremlin.git
cd gremlin

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check .

License

MIT

Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.

Acknowledgments

Inspired by exploratory testing principles from James Bach and James Whittaker
Powered by Claude from Anthropic

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Feb 21, 2026

0.2.2

Feb 20, 2026

This version

0.2.1

Feb 17, 2026

0.2.0

Feb 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gremlin_critic-0.2.1.tar.gz (3.3 MB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gremlin_critic-0.2.1-py3-none-any.whl (38.4 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file gremlin_critic-0.2.1.tar.gz.

File metadata

Download URL: gremlin_critic-0.2.1.tar.gz
Upload date: Feb 17, 2026
Size: 3.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for gremlin_critic-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`38cbecb6fc6ae52a53d74b4f6c1bcf830cd3788236811fab5e287a524ea96331`
MD5	`172f5a26bff7133cf58b97b925ec7f07`
BLAKE2b-256	`4894653c5838d79e8cd0a20f6b47165bbe65a71b1efeb52989064471552ae35e`

See more details on using hashes here.

File details

Details for the file gremlin_critic-0.2.1-py3-none-any.whl.

File metadata

Download URL: gremlin_critic-0.2.1-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 38.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for gremlin_critic-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e384748940dcd91b5316deb89f0be10cbaf8a8da85041f2907e8c19ca69ca890`
MD5	`8c656f621f89f5bdcb809f23c725a5fd`
BLAKE2b-256	`c0752e17212e70afe8406155afb713eb73de8b69a3804ba8f4b1cadf935a2754`

See more details on using hashes here.

gremlin-critic 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Gremlin

What is Gremlin?

Installation

Quick Start

CLI Usage

Programmatic API (New in v0.2.0)

Example Output

Risk Dashboard

Commands

Options for review

Custom Patterns

Project-level (auto-loaded)

Via --patterns flag

Learn from incidents

Pattern Domains

How It Works

Performance

Claude Code Integration

Programmatic API

Installation

Basic Usage

Configuration

Output Formats

Risk Analysis

Async Support

Use Cases

API Reference

Development

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Options for `review`

Via `--patterns` flag