Skip to main content

Online deadlock-breaker — injects targeted engineering heuristics from rosclaw-know assets into stalling agents.

Project description

ROSClaw-How

Online deadlock-breaker for agents stuck on engineering-optimization tasks, with a feedback loop that lets the assets refine themselves over time.

Sister project: rosclaw-know (offline refinery that produces the bridge_index.json + code_patterns/ assets this service serves at runtime, and consumes the outcome JSONL this service exports to drive the next publish cycle).

What it does

When an agent's verifier score plateaus or a physical safety symptom appears in its error log, this service injects a small, targeted hint into its next prompt. Three strategies, decided server-side:

Strategy Trigger Returned payload
SAFETY Error log mentions a safety symptom Hard-coded constraint (~50–100 tokens)
FREE_EXPLORATION First 3 iterations, or score improving Empty string — keep exploring
CATALYST Score plateau / regression Cross-domain analogy + diff (≤ 400 tokens)

Runtime is pure rules + a single vector lookup — zero LLM calls. The CATALYST path returns an injection_id so the agent can later report whether the hint helped (POST /wiki/v1/prompt/feedback); the resulting outcomes drive per-pattern uplift statistics and soft-deprecation of under-performing patterns.

Feedback loop (the “push + learn” cycle)

                  ┌─────────────────────────────────────────┐
                  │ rosclaw-know  (offline refinery)        │
                  │   awesome_fetcher → new raw corpus      │
                  │   active_learning → autodraft → ingest  │
                  │   feedback_distill.py → pattern_metrics │
                  │   bridge_reweighter   → priority=-1     │
                  │                                         │
                  │   writes bridge_index.json (+priority)  │
                  └───────────────────┬─────────────────────┘
                                      │ asset publish
                                      ▼
   ┌─────────────────────────── rosclaw-how ───────────────────────────┐
   │ asset_loader        delta-sync bridge_index → SeekDB               │
   │ SemanticRouter      skips clusters with priority < 0               │
   │                                                                    │
   │ POST /wiki/v1/prompt/build       → snippet + injection_id          │
   │ POST /wiki/v1/prompt/feedback    → post_score, delta_score         │
   │ GET  /wiki/v1/stats              → bucketed uplift / win_rate      │
   │ GET  /wiki/v1/blind_spots        → recurring Unknown_Error gaps    │
   │ GET  /wiki/v1/outcomes/export    → NDJSON stream for offline pipe  │
   │ POST /wiki/v1/admin/reload       → hot-reload assets               │
   │ POST /wiki/v1/admin/promote      → maturity gate (staging→prod)    │
   └────────────────────────────────────────────────────────────────────┘
                                      │ NDJSON export
                                      ▼
                  ┌─────────────────────────────────────────┐
                  │ rosclaw-know   data/exports/*.jsonl     │
                  │   distill_feedback.py → re-publish ↻    │
                  └─────────────────────────────────────────┘

Closed-loop validation: 6/6 stuck-rollout scenarios pass the replay benchmark (scripts/replay_benchmark.py on the rosclaw-know side) — bad patterns get priority=-1, vanish from the next CATALYST lookup, and good patterns keep their slot.

Quick start

cd rosclaw-how
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

cp .env.example .env
# (optional) symlink the assets from a finished rosclaw-know run:
ln -s ../../rosclaw-know/data/assets data/assets

# Run tests
pytest -q

# Start the server
python scripts/run_server.py
# → POST http://localhost:47820/wiki/v1/prompt/build

Deploying on a memory-constrained host

pyseekdb embedded mode boots a small OceanBase-like observer in-process (~1.5–2 GB RAM after warmup, plus a 4 GB datafile reservation on disk). If you are running on a host where another embedded SeekDB instance is already up (e.g., the legacy rosclaw-wiki service), or where the box is too small to fit a second observer, switch to server mode:

# Or use a remote cluster
SEEKDB_MODE=server
SEEKDB_HOST=10.0.0.5
SEEKDB_PORT=2881
SEEKDB_TENANT=sys           # OceanBase tenant name
SEEKDB_USER=root
SEEKDB_PASSWORD=

The SAFETY and FREE_EXPLORATION paths never touch SeekDB, so they remain available even when the database is unreachable. The CATALYST path falls back to InMemoryRouter (numpy cosine over bridge_index.json) when ROSCLAW_HOW_ROUTER_BACKEND=inmemory is set — useful when SeekDB is down or absent.

Auto-create database

On first boot, seekdb_client._ensure_database_exists runs an idempotent CREATE DATABASE IF NOT EXISTS <SEEKDB_DATABASE> via pyseekdb.AdminClient before opening the data-plane client. This means a fresh embedded SeekDB (which only ships the test database by default) bootstraps cleanly with SEEKDB_DATABASE=rosclaw_how without manual setup.

API

All endpoints share the /wiki/v1 prefix kept for compatibility with the legacy rosclaw-wiki API.

POST /wiki/v1/prompt/build

Auth: X-API-Key.

Request:

{
  "error_log": "ERROR: torque overflow on joint 2",
  "previous_scores": [0.42, 0.47, 0.47, 0.46],
  "current_iteration": 4
}

Response on a CATALYST hit (the injection_id is the handle for the follow-up feedback call):

{
  "prompt_snippet": "## 🔧 Engineering Heuristics from ROSClaw-How ...",
  "injected": true,
  "strategy": "CATALYST",
  "symptom": "Oscillation_Divergence",
  "matched_symptom": "Commanded velocity diverges to ±∞ ...",
  "similarity": 0.5864,
  "injection_id": "0fd3eb2bd37c461490c4f43def243512",
  "pattern_id": "pattern_output_saturation_clamp",
  "latency_ms": 199
}

When the matched cluster is in staging (priority=0), the response includes is_staging: true so the agent knows the pattern has not yet been promoted to production:

{
  "strategy": "CATALYST",
  "is_staging": true,
  "pattern_id": "pattern_20260518_1bfb99e13c"
}

Production clusters (priority=1 or unset) omit the key entirely for backward compatibility.

SAFETY and FREE_EXPLORATION responses omit injection_id / pattern_id.

POST /wiki/v1/prompt/feedback

Auth: X-API-Key. Returns 204 No Content on success, 404 if the id is unknown.

{
  "injection_id": "0fd3eb2bd37c461490c4f43def243512",
  "post_score": 0.83,
  "iterations_to_resolve": 3,
  "agent_notes": "anti-windup clamp fixed it"
}

The server computes delta_score = post_score - pre_score (where pre_score is the last entry from the original previous_scores).

GET /wiki/v1/stats

Public, no auth. Aggregates finalised outcomes (those that have received feedback) per pattern_id, grouped by maturity bucket:

{
  "staging": {
    "pattern_20260518_1bfb99e13c": {
      "n": 5,
      "avg_uplift": 0.142,
      "win_rate": 0.8,
      "last_seen_iso": "2026-05-18T19:14:22+00:00"
    }
  },
  "production": {
    "pattern_output_saturation_clamp": {
      "n": 8,
      "avg_uplift": 0.157,
      "win_rate": 0.875,
      "last_seen_iso": "2026-05-18T19:14:22+00:00"
    }
  },
  "demoted": {
    "pattern_bad_habit": {
      "n": 12,
      "avg_uplift": -0.03,
      "win_rate": 0.25,
      "last_seen_iso": "2026-05-18T19:14:22+00:00"
    }
  },
  "unbucketed": {}
}

win_rate = sum(delta_score > 0.05) / n.

The unbucketed catch-all holds pattern_ids whose owning cluster was deleted or renamed since the outcome was recorded.

GET /wiki/v1/outcomes/export

Auth: X-API-Key. Streams every outcome (including still-pending ones) as newline-delimited JSON. Query params:

  • since — ISO 8601 timestamp; only rows with ts >= since are emitted.
  • limit — optional row cap (max 100 000).
curl -H "X-API-Key: $ROSCLAW_HOW_API_KEY" \
     "http://127.0.0.1:47820/wiki/v1/outcomes/export?since=2026-05-17T00:00:00+00:00" \
     -o outcomes.jsonl

The same content is also produced by scripts/export_outcomes.py, which is a thin CLI wrapper around this endpoint (it used to read SeekDB directly but deadlocked against the embedded server's process-exclusive lock — fixed in Phase 4).

GET /healthz

Public, no auth. Operational snapshot:

{
  "status": "ok",
  "version": "0.1.0",
  "auth_enabled": true,
  "seekdb_mode": "embedded",
  "router_backend": "seekdb",
  "cluster_count": 349,
  "embedding_dim": 384,
  "bridge_index_mtime": "2026-05-18T18:28:04+00:00",
  "similarity_floor": 0.5,
  "blind_spot_count": 0
}

blind_spot_count is the number of Unknown_Error prefix buckets that have crossed the recurrence threshold within the active sliding window (see GET /wiki/v1/blind_spots below).

POST /wiki/v1/admin/reload

Auth required (X-API-Key). Re-reads bridge_index.json and code_patterns/ into SeekDB without bouncing the server. Body is optional:

curl -X POST http://127.0.0.1:47820/wiki/v1/admin/reload \
     -H "X-API-Key: $ROSCLAW_HOW_API_KEYS" \
     -H "Content-Type: application/json" \
     -d '{}'             # incremental (delta) reload

{"rebuild": true} drops both SeekDB collections first; default is an idempotent incremental upsert. The loader fingerprints each cluster (standard_name + sorted patterns + sorted keywords + canonical-JSON analogies + priority) with SHA-256 — unchanged rows skip the sentence-transformer encode call entirely. On a 350-cluster bundle this turns a ~4-minute full reload into a ~20-second no-op when nothing has changed.

Rows whose IDs disappeared from the bridge — or whose priority flipped to -1 (soft-deprecated) — are deleted from SeekDB. The response exposes both the alive totals and the per-bucket counters so dashboards can show "what just happened":

{
  "symptoms": 349,
  "patterns": 352,
  "demoted_skipped": 3,
  "symptoms_detail": {"added": 16, "updated": 0, "unchanged": 333, "deleted": 0},
  "patterns_detail": {"added": 0, "updated": 0, "unchanged": 352, "deleted": 0},
  "rebuild": false,
  "duration_ms": 23900
}

After loading, the cached SemanticRouter is rebuilt synchronously so /healthz immediately reports the fresh cluster_count / router_backend (rather than a null window until the next CATALYST request).

POST /wiki/v1/admin/promote

Auth required (X-API-Key). Bump or set a cluster's maturity priority.

Body accepts exactly one of delta (relative change) or priority (absolute set). The lookup key is the pattern_id (one of the *.md files in code_patterns/); the endpoint walks bridge_index.json to find the owning cluster whose associated_patterns list contains the given pattern_id.

# Relative bump (capped to [-1, +1])
curl -X POST http://127.0.0.1:47820/wiki/v1/admin/promote \
     -H "X-API-Key: $ROSCLAW_HOW_API_KEYS" \
     -H "Content-Type: application/json" \
     -d '{"pattern_id": "pattern_20260518_1bfb99e13c", "delta": 1}'

# Absolute set (also capped)
curl -X POST http://127.0.0.1:47820/wiki/v1/admin/promote \
     -H "X-API-Key: $ROSCLAW_HOW_API_KEYS" \
     -H "Content-Type: application/json" \
     -d '{"pattern_id": "pattern_20260518_1bfb99e13c", "priority": 1}'

Response:

{
  "pattern_id": "pattern_20260518_1bfb99e13c",
  "cluster_id": "20260518_1bfb99e13c",
  "old_priority": 0,
  "new_priority": 1
}

On success, the endpoint:

  1. Atomically updates bridge_index.json
  2. Appends one JSONL row to data/audit_log.jsonl
  3. Re-upserts the cluster's metadata in SeekDB so the router sees the change immediately
  4. Invalidates the cached router

Priority semantics:

Value Meaning
-1 Demoted — runtime skips (soft-deprecated)
0 Staging — runtime injects with is_staging=true
+1 Production — normal, no flag
unset Backward compat — treated as production

Returns 404 when pattern_id is not found in any cluster, 422 when both/neither of delta/priority is provided.

GET /wiki/v1/blind_spots

Public, no auth. Sliding-window summary of recurring Unknown_Error prefixes — i.e. errors the catalyst layer keeps seeing but has no matching cluster for. This is the work-list for the rosclaw-know triage queue: each entry corresponds to a pattern we should be teaching.

{
  "window_seconds": 3600,
  "threshold": 3,
  "active": [
    {
      "prefix_hash": "1a3c…",
      "count": 7,
      "first_seen": "2026-05-18T19:18:00+00:00",
      "last_seen": "2026-05-18T19:54:12+00:00",
      "sample_excerpt": "RuntimeError: undocumented quirk in controller stage",
      "is_blind_spot": true
    }
  ],
  "total_unique_prefixes": 4,
  "total_events": 13
}

Each crossing event also appends one JSONL row to data/blind_spots.jsonl (configurable via ROSCLAW_HOW_BLIND_SPOTS_PATH). A prefix is only emitted once per window — if it goes quiet for the window length and recurs later, the next crossing produces a fresh row.

Tuning knobs (env vars, defaults shown):

Variable Default Purpose
ROSCLAW_HOW_BLIND_SPOT_WINDOW 3600 sliding-window length in seconds
ROSCLAW_HOW_BLIND_SPOT_THRESHOLD 3 events needed to flag a prefix
ROSCLAW_HOW_BLIND_SPOTS_PATH data/blind_spots.jsonl persistent log

GET /ui

Public, no auth. Single-page operator dashboard. Vanilla HTML + JS, no external CDN; polls /healthz, /wiki/v1/stats, and /wiki/v1/blind_spots every 5 seconds and renders:

  • Health KPIs — version, router backend, cluster count, embedding dim, similarity floor, bridge mtime, live blind-spot count.
  • Pattern uplift table — sortable by bucket (staging / production / demoted / unbucketed), per-pattern n / avg_uplift / win_rate / last_seen with an inline bar for the uplift magnitude.
  • Blind spots — current recurring Unknown_Error prefixes (only those past threshold), with their hash, last-seen timestamp, and a truncated sample excerpt for triage.

Useful as a smoke-screen during deployments and as a low-friction view into the feedback loop without spinning up a full Grafana stack.

Architecture

rosclaw-know (offline)                rosclaw-how (online, this repo)
─────────────────                     ───────────────────────────────
Reads  6,097 wiki/*.md                Reads SeekDB at runtime
Writes data/assets/bridge_index.json  Loads assets at startup
       data/assets/code_patterns/*    Serves build / feedback / stats / export
Reads  data/exports/*.jsonl           Writes outcome rows on feedback
       (closing the loop)
─────────────────                     ───────────────────────────────
                          ▶ ▶ ▶  assets travel from know → how
                          ◀ ◀ ◀  outcomes travel from how → know

Source layout:

src/rosclaw_how/
  __init__.py
  api.py              FastAPI app: 9 endpoints
  asset_loader.py     Startup load + --rebuild; delta-sync with content-hash
  auth.py             API-key header check (single-tenant in v0.1)
  blind_spots.py      Sliding-window tracker for Unknown_Error prefixes
  config.py           Typed wrapper around .env
  error_normalizer.py Pure regex: error_log → 10 standardized symptom labels
  inmemory_router.py  RAM-frugal numpy cosine fallback (no SeekDB needed)
  outcomes.py         injection_outcomes persistence + per-pattern aggregation
  semantic_router.py  SeekDB vector search + inspiration assembly + priority gate
  seekdb_client.py    pyseekdb wrapper; auto-creates database; embedded+server
  state_router.py     SAFETY / FREE_EXPLORATION / CATALYST classifier

Router backends

ROSCLAW_HOW_ROUTER_BACKEND chooses:

  • auto (default) — seekdb when datafile exists, else inmemory
  • seekdb — explicit production path; raises on init failure
  • inmemory — explicit RAM fallback; reads bridge_index.json directly

Both routers expose the same find_nearest() contract plus cluster_count and embedding_dim properties (surfaced on /healthz).

Runtime priority gate

When rosclaw-know's bridge_reweighter decides a cluster has been hurting agents (negative aggregate uplift with sufficient n), it writes "priority": -1 into the cluster entry of bridge_index.json. After the next asset publish:

  • asset_loader carries the field into symptom_index metadata.
  • SemanticRouter.find_nearest over-fetches top-3K results and walks them in similarity order, skipping any cluster with priority < 0.
  • InMemoryRouter.find_nearest applies the same filter against its in-RAM matrix.

So a soft-deprecated cluster vanishes from CATALYST hits on the next asset-loader cycle without any agent code change.

What this replaces

The previous rosclaw-wiki cloud API hosted 17 declarative-knowledge endpoints (search, judgments, code generation, etc.). Empirically, agents in Frontier-Engineering's optimization loop regressed ~20% when they pulled from those endpoints — they got encyclopedic context when they needed a poke.

rosclaw-how is the focused replacement: nine endpoints, three strategies, ≤400 tokens per CATALYST snippet, no LLM in the hot path, and a feedback loop that keeps the asset bundle honest.

Closed-loop verification

Two harnesses, each tuned to a different cost/coverage budget:

  • scripts/verify_how_seekdb.py — strict 4-case verifier that pre-flights /healthz (refuses non-seekdb backends), then asserts each case is CATALYST with similarity ≥ similarity_floor and latency_ms < 1500. Writes data/benchmarks/how_ab_seekdb/summary.json.

  • scripts/verify_how_lite.py — A/B against DeepSeek for 4 stuck cases (control: FREE_EXPLORATION; treatment: CATALYST). Used by the rosclaw-know side's replay_benchmark.py to drive 50+ synthetic rollouts end-to-end through build → inject → feedback → distill → re-publish.

export ROSCLAW_HOW_API_KEY=rw_sk_dev_local

# A/B against the deployed service
python scripts/verify_how_seekdb.py

# Faster smoke against DeepSeek, no Frontier-Engineering setup needed
python scripts/verify_how_lite.py --no-agent

# Heavy: hits the real Frontier-Engineering eval (needs that repo)
python scripts/verify_how.py --iterations 500

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rosclaw_how-1.0.2.tar.gz (636.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rosclaw_how-1.0.2-py3-none-any.whl (907.5 kB view details)

Uploaded Python 3

File details

Details for the file rosclaw_how-1.0.2.tar.gz.

File metadata

  • Download URL: rosclaw_how-1.0.2.tar.gz
  • Upload date:
  • Size: 636.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rosclaw_how-1.0.2.tar.gz
Algorithm Hash digest
SHA256 02f9ae2a0fd7b3ccc35486272cfcdf01509dc05b1876262c5ee37448241cf086
MD5 dac761f1ffae71ce5b258b1dde4b8cb3
BLAKE2b-256 854eb7f2f7f29f6692edebaec7c4408dbcdf30f341e5c222ea847a6f3fdf28af

See more details on using hashes here.

File details

Details for the file rosclaw_how-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: rosclaw_how-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 907.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rosclaw_how-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a5f5124d202d91bd624d55bdbbb189f0d927eca9af903582e5e0034be3a965df
MD5 24e8030f7967e19dacf15292590a5eaa
BLAKE2b-256 79635dcd5fd9eabda72210aa1b00c00f12024864eba9c0f53cc9ce5cb3e1459f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page