Skip to main content

Autonomous change tester with cognitive synthetic users (Node.js CLI, installable via pip)

Project description

Quickstart (60 seconds)

npm install gmirror
import { MirrorSDK } from 'gmirror';
const mirror = new MirrorSDK({ apiKey: process.env.ANTHROPIC_API_KEY });
const verdict = await mirror.score({ task: 'write tests', output: 'describe(...){}' });
console.log(verdict.overall); // 'pass' or 'fail'

No Docker. No services. Score any LLM output against a synthetic user panel.


GMirror

GMirror is the synthetic-user verification layer for the G-Stack. It scores changes with cognitive user populations, records verdicts, detects failure modes, calibrates scoring thresholds, and keeps regression evidence available for release gates.

What It Does

  • Scores diffs, prompts, flows, and implementation outputs against synthetic user panels.
  • Aggregates correctness, user outcome, risk, cost, and failure-mode dimensions into verdicts.
  • Maintains a failure-mode library and cluster view.
  • Calibrates scoring thresholds and model tiers.
  • Persists verdicts, receipts, cost entries, trend/drift data, and audit logs.
  • Exposes CLI and MCP surfaces for agents and operators.

Install from PyPI (pip install gmirror)

GMirror is also distributed as a Python wheel that bundles the compiled JavaScript CLI and launches it via your local Node.js:

pip install gmirror
gmirror --version
gmirror --help

Prerequisite: Node.js >= 18 must be on your PATH (the wheel does not embed a Node runtime). Install it from https://nodejs.org/. The launcher checks for Node and prints a clear error if it is missing or too old.

Optional durable persistence: verdicts/receipts persist to SQLite via the native better-sqlite3 addon, which is not shipped in the wheel. If it is not installed, GMirror automatically degrades to an in-memory store (set GMIRROR_REQUIRE_PERSISTENCE=true to fail instead). To enable durable storage, install it into a Node project on your machine (npm install better-sqlite3) so it is resolvable, or run the npm package directly.

Quick Start

npm install
npm run build
node dist/cli.js health
node dist/cli.js score --diff ./change.patch --panel-size 10

Development checks:

npm run typecheck
npm test
npm run verify
npm run docs:api

Command Surface

Command Purpose
gmirror score Score a change with a synthetic user panel.
gmirror calibrate Calibrate thresholds and scoring settings.
gmirror health Check local and stack health.
gmirror sync Register stack tool sources with GBrain using incremental, full, and dry-run modes.
gmirror replay Replay a previous scoring run.
gmirror failure-modes, clusters Inspect failure-mode library and clusters.
gmirror secrets Rotate and list local secrets without printing secret values.
gmirror eval Run evaluation corpora.
gmirror receipts, diff Inspect and compare receipts.
gmirror stats, trend, drift, regress Analyze verdict history and quality drift.
gmirror cost, sandbox-stats, metrics Inspect spend, sandbox usage, and observability.
gmirror backup, restore, export Manage durable state.

gmirror sync --incremental emits gstack-compatible stage results, registers each stack tool as a federated GBrain source with a pathhash8 ID, and writes a .gbrain-source attachment into each tool path. gmirror sync --full also removes legacy source IDs from the prior sync state. gmirror sync --dry-run --json shows planned commands without acquiring a lock, writing source dotfiles, or updating state.

Verdict Flow

flowchart LR
  Client["CLI or MCP client"] --> Score["Score request"]
  Score --> Population["Synthetic user population"]
  Score --> Scenarios["Scenario and failure-mode selection"]
  Population --> Runner["Runner"]
  Scenarios --> Runner
  Runner --> Verdict["Verdict aggregation"]
  Verdict --> FailureModes["Failure-mode library"]
  Verdict --> Receipts["Receipt JSONL"]
  Verdict --> SQLite["SQLite verdict persistence"]
  Verdict --> Metrics["Metrics, traces, audit logs"]

MCP Integration

{
  "mcpServers": {
    "gmirror": {
      "command": "gmirror",
      "args": ["serve"]
    }
  }
}

Primary MCP tools are gmirror_score, gmirror_health, gmirror_failure_modes, gmirror_get_failure_modes, gmirror_calibrate, gmirror_get_receipts, gmirror_get_trend, gmirror_get_drift, and gmirror_get_cost_stats.

Configuration

Common environment variables:

Variable Purpose
GMIRROR_DB_PATH Override SQLite verdict database path.
GMIRROR_AUDIT_DIR Override audit JSONL directory.
GMIRROR_METRICS_PATH Override persisted LLM metrics path.
GMIRROR_HEALTH_WEBHOOK_URL Send health-drop webhook notifications.
GMIRROR_LLM_CALL_RESERVE_USD Per-call budget reservation.
GMIRROR_BUDGET_RESERVATION_TTL_MS Budget reservation expiration.
GMIRROR_SYNC_ROOT Override the gstack-gbrain-sync lock and state directory.
GMIRROR_SECRET_DIR Override the local file-backed secret manager directory.
GMIRROR_PERMISSIONS_FILE JSON file of SHA-256 token hashes mapped to allowed scopes.
GMIRROR_RATE_LIMIT_RPM Per-token MCP calls per minute.
GMIRROR_RATE_LIMIT_RPH Per-token MCP calls per hour.
GMIRROR_HEALTH_RATE_LIMIT_RPM Per-client health endpoint calls per minute.
GMIRROR_HEALTH_SHUTDOWN_TOKEN Legacy fallback for the health_shutdown_token secret.
GMIRROR_TOOL_<NAME>_PATH Override a source path for gbrain, gstack, gorchestrator, gmirror, gtom, or glearn.
GBRAIN_ENDPOINT GBrain endpoint for receipt storage and retrieval.
GBRAIN_INTEGRATION_MODE http or mcp transport for GBrain context, scenario corpus, replay, and QC writes.
GBRAIN_MCP_ENDPOINT GBrain MCP endpoint when MCP mode is enabled.
GBRAIN_AUTH_TOKEN Legacy fallback for the gbrain_auth_token secret.
GBRAIN_TIMEOUT_MS Per-call timeout for GBrain calls.
GBRAIN_MAX_RETRIES Retry count for transient GBrain failures.
GBRAIN_BACKOFF_MS Initial retry backoff for GBrain calls.
GBRAIN_CIRCUIT_FAILURES Consecutive transient failures before opening the GBrain circuit.
GBRAIN_CIRCUIT_COOLDOWN_MS GBrain circuit breaker cooldown.
DEFAULT_PANEL_SIZE Default synthetic user panel size.
ADVERSARIAL_RATIO Default adversarial scenario ratio.

Documentation

Document Scope
API overview CLI, MCP, and TypeScript surfaces.
Generated API docs TypeDoc output generated by npm run docs:api.
MCP contract Tool schemas, scopes, and compatibility rules.
Evaluation baseline Verdict corpus and acceptance thresholds.
Runbook Operator workflows and incident response.
Troubleshooting Known failure modes and fixes.
Security model Trust boundaries, secrets, and audit posture.
Data flow Mermaid data-flow diagram and persistence map.
Integration guide Embedding GMirror in projects and agent clients.
Migrations SQLite migration process.
Operations Deployment and release operations.
Testing Test layers and quality gates.
ADR 0001 Verdict evidence architecture decision.

Verification

npm run verify
git diff --check

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gmirror-0.5.0.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gmirror-0.5.0-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file gmirror-0.5.0.tar.gz.

File metadata

  • Download URL: gmirror-0.5.0.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for gmirror-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8bfc0f28654e72787c7fa2c8c6cc2198ac7fdd9d3c4d5a7227162c6f33e5487a
MD5 c8281edba237d4e01c429509d81f64e8
BLAKE2b-256 0e7eec5ac0a184f1c02e44a535e66e4a6e3a6669cadbe988b450b49e81df9e5e

See more details on using hashes here.

File details

Details for the file gmirror-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: gmirror-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for gmirror-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f9ca410be832f3f851fe7621d582b8528befc70ce5d39717781dcdf837f0de18
MD5 eb455fc8b1739f8ce68e46d84b7aea64
BLAKE2b-256 43fe935e8169c571c8a2143415232b834287886482f3c037958fceff728d9d87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page