Skip to main content

Pre-ship risk critic (CLI + Python library) — surfaces breaking risk scenarios before they reach production

Project description

Gremlin

Pre-ship risk critic — surfaces what could break before it reaches production

PyPI CI Live Demo

Feed Gremlin a feature spec, PR diff, or plain English — it critiques it for blind spots using 107 curated "what if?" patterns across 14 domains, applied by Claude.

pip install gremlin-critic
gremlin review "checkout flow with Stripe"
🔴 CRITICAL (95%) — Webhook Race Condition
   What if the Stripe webhook arrives before the order record is committed?
   Impact: Payment captured but order not created.

🟠 HIGH (87%) — Double Submit on Payment Button
   What if the user clicks "Pay Now" twice rapidly?
   Impact: Potential duplicate charges.

Three ways to use it

1. CLI

# Review a feature
gremlin review "checkout flow"

# With context (diff, file, or string)
git diff | gremlin review "my changes" --context -
gremlin review "auth system" --context @src/auth/login.py

# Deep analysis, lower confidence threshold
gremlin review "payment refunds" --depth deep --threshold 60

# Learn from incidents
gremlin learn "Nav showed Login after auth" --domain auth --source prod

Pipeline stage commands (v0.3)

Run each analysis stage independently — useful for caching, debugging, or building custom pipelines:

# Stage 1 — infer domains, write understanding.json (no LLM call)
gremlin understand "checkout flow"

# Stage 2 — select patterns, write scenarios.json (no LLM call)
gremlin ideate

# Stage 3 — call LLM, write results.json
gremlin rollout

# Stage 4 — parse + score risks, write scores.json
gremlin judge

# With optional validation pass
gremlin judge --validate

# Custom run directory (default: .gremlin/run/)
gremlin understand "auth" --run-dir /tmp/my-run
gremlin ideate --run-dir /tmp/my-run
gremlin rollout --run-dir /tmp/my-run
gremlin judge --run-dir /tmp/my-run

Each stage reads the previous stage's artifact and writes its own — understanding.jsonscenarios.jsonresults.jsonscores.json.

2. GitHub Action

Add to any repo — Gremlin posts a risk report on every PR automatically.

# .github/workflows/gremlin-review.yml
name: Gremlin Risk Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install gremlin-critic
      - run: git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr-diff.txt
      - run: |
          python3 .github/scripts/gremlin_analyze.py \
            "${{ github.event.pull_request.title }}" \
            /tmp/pr-diff.txt /tmp/gremlin-report.json
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - uses: actions/github-script@v7
        with:
          script: |
            const data = JSON.parse(require('fs').readFileSync('/tmp/gremlin-report.json','utf8'));
            const risks = data.risks || [];
            const s = data.summary || {};
            const body = risks.length === 0
              ? '## Gremlin Risk Review\n\nNo risks above threshold.'
              : `## Gremlin Risk Review\n\n**${risks.length} risk(s)** — 🔴 ${s.critical||0} critical · 🟠 ${s.high||0} high · 🟡 ${s.medium||0} medium\n\n` +
                risks.map(r => `### ${r.severity}: ${r.title||r.scenario}\n**Confidence:** ${r.confidence}%\n\n${r.impact}`).join('\n\n---\n\n');
            github.rest.issues.createComment({issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body});

Set ANTHROPIC_API_KEY as a repository secret (Settings → Secrets → Actions). See the full script used in this repo.

3. Python API

from gremlin import Gremlin

g = Gremlin()
result = g.analyze("checkout flow", context="Using Stripe + Next.js")

# Check severity
if result.has_critical_risks():
    print(f"{result.critical_count} critical risks found")

# Output formats
result.to_json()         # JSON string
result.to_junit()        # JUnit XML for CI
result.format_for_llm()  # Concise format for agents

# Async
result = await g.analyze_async("payment processing")

# Block CI on critical risks
if result.has_critical_risks():
    sys.exit(1)

Risk Dashboard

Live visualization of Gremlin results applied to open-source projects — abhi10.github.io/gremlin

  • Heatmap · severity donut · domain bar chart · filterable risk table
  • Applied to celery, pydantic, and more

Pattern Domains

107 patterns across 14 domains — universal patterns run on every analysis, domain patterns trigger by keyword match:

Domain Keywords
payments checkout, stripe, billing, refund
auth login, session, token, oauth
database query, migration, transaction
concurrency async, queue, race, lock
infrastructure deploy, config, cert, secret
file_upload upload, image, file, cdn
api endpoint, rate limit, webhook
+ 7 more ...

Custom patterns

# .gremlin/patterns.yaml — auto-loaded per project
domain_specific:
  image_processing:
    keywords: [image, resize, cdn]
    patterns:
      - "What if EXIF rotation is ignored during resize?"

Performance

90.7% tie rate vs. baseline Claude Sonnet across 54 real-world test cases — patterns match raw LLM quality while adding domain-specific coverage.

Metric Result
Win / Tie Rate 98.1%
Gremlin Wins 7.4% — patterns caught risks Claude missed
Pattern Count 107 across 14 domains

Installation

pip install gremlin-critic
export ANTHROPIC_API_KEY=sk-ant-...

Supports: Anthropic (default) · OpenAI · Ollama (local, no API key needed)

g = Gremlin(provider="ollama", model="llama3")  # fully local

For development:

git clone https://github.com/abhi10/gremlin.git
pip install -e ".[dev]"
pytest

Commands

Command Description
gremlin review "scope" Full pipeline in one command
gremlin review "scope" --context @file With file context
git diff | gremlin review "changes" --context - With diff via stdin
gremlin patterns list Show all pattern domains
gremlin patterns show payments Show patterns for a domain
gremlin learn "incident" --domain auth Learn from incidents
gremlin understand "scope" Stage 1 — infer domains (no LLM)
gremlin ideate Stage 2 — select patterns (no LLM)
gremlin rollout Stage 3 — call LLM
gremlin judge Stage 4 — parse and score risks

review options: --depth quick|deep · --threshold 0-100 · --output rich|md|json · --validate

understand options: --depth quick|deep · --threshold 0-100 · --run-dir PATH

judge options: --validate · --run-dir PATH


License

MIT · Powered by Claude · Inspired by exploratory testing principles from James Bach and James Whittaker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gremlin_critic-0.3.0.tar.gz (3.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gremlin_critic-0.3.0-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file gremlin_critic-0.3.0.tar.gz.

File metadata

  • Download URL: gremlin_critic-0.3.0.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for gremlin_critic-0.3.0.tar.gz
Algorithm Hash digest
SHA256 91abe2903af20bc780ac5cfc8f52b0811c58b329b251f527270410b0e2d356bc
MD5 e5a7b809bf616faf1078cc6b079da6fb
BLAKE2b-256 7c2870dc2b716499aa0d6dbd6a221b57cb8da95c0c88901dbcc25c7cf1de9870

See more details on using hashes here.

File details

Details for the file gremlin_critic-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: gremlin_critic-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for gremlin_critic-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 640813c9816bb70b098465cc84864a29dc061bf22fe115d5889d69bb0efc6043
MD5 9686c6e0c4ffc850f2101611b7530e77
BLAKE2b-256 9721d0c472ea13bf0ef9d83dbb854a084daff4adddf5c1c29deac70d32579006

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page