Skip to main content

Deterministic safety hooks for Claude Code

Project description

Sensorium 🧠🛡️

A few months ago, a developer asked Claude to clean up an old repo.

Claude ran rm -rf tests/ patches/ plan/ ~/.

That trailing ~/ expands to the home directory. It wiped years of files on their Mac. The post hit 1,500+ upvotes on r/ClaudeAI within hours — because everyone building with agents recognized exactly how it happened: not malice, just confidence with no checkpoint.

It's not an isolated story:

  • A founder watched a Cursor agent find an unrelated API token, decide it had permission, and delete an entire production database and its backups — in 9 seconds. (Railway CEO personally helped restore it.)
  • Developers on GitHub have documented Claude Code running git reset --hard, wiping hours of uncommitted work, right after telling the user the operation was "safe."
  • A benchmark on 45 failing test suites found agents reporting "45/45 pass" when only 26 actually did — the other 19 quietly never ran the tests that would've said otherwise.

Same root cause every time: the model's confidence and the actual safety of the action are two different variables, and nothing was checking the second one.

So I built the boundary myself. Excited to share Sensorium — deterministic safety hooks for Claude Code. 👇


The problem nobody puts in the demo video

LLM agents are incredible at writing code. They are also, occasionally, incredible at:

  • 🔥 running rm -rf with a trailing ~/ nobody meant to include
  • 🔥 running git reset --hard on your uncommitted work, confidently
  • 🔥 curl -X DELETE-ing a resource because the docs made it sound safe
  • 🔥 declaring "all tests pass" without running the ones that don't
  • 🔥 reapplying a stale cached manifest as if it were current state

None of this is malice. It's confidence without a checkpoint. And "just review every diff" doesn't scale when the agent is running fifty tool calls a session.

The insight

You don't need a second LLM watching the first one. You need sensors — small, deterministic, boring rules that wake up on a specific kind of change, check exactly what they care about, and say yes/no/wait.

No vibes. No judgment calls. Pattern matching and policy, all the way down.

Claude wants to use a tool
  → PreToolUse hook
  → Sensorium reads the tool input
  → sensors match
  → allow / block / warn
  → tool executes (or doesn't)
  → PostToolUse hook
  → Sensorium checks the resulting file content
  → writes an audit ledger

What this actually catches, out of the box

Protects against How Gate
rm -rf /, rm -rf ~, dd ... of=/dev/sd* filesystem.wipe unconditional block
Bulk delete without a backup filesystem.bulk_delete needs backup_exists + dry_run_passed
git reset --hard, git clean -fxd git.destructive needs backup_exists
curl -X POST/PUT/DELETE/PATCH external_api.write needs a snapshot + rollback plan
kubectl apply/delete, terraform apply/destroy, aws ... delete infra.mutation needs snapshot + dry-run + rollback
Direct psql/mysql writes db.write needs snapshot + rollback plan
Reapplying a stale archive/cache as truth data.apply_from_archive unconditional block
Skipped tests slipped in quietly test_skip_introduced warn, shows up in the audit report

--dry-run on the command bypasses the gate — sensors check for it explicitly.

Bring your own rules 🔧

This is the part I actually wanted to ship. Your project has opinions Sensorium can't guess — so tell it:

# .sensorium/sensors.yaml
sensors:
  - name: no_force_push
    description: Block git push --force (plain push still allowed)
    tools: [Bash]
    action: block
    patterns:
      - 'git\s+push\b.*(--force\b|-f\b)'
    unless:
      - '--dry-run'
    message: |
      Blocked: force-push detected. Use --force-with-lease and confirm
      with a human first.

Drop it in, no restart, no config reload — the next tool call picks it up.

Sensor fields:

  • tools — which Claude Code tools trigger this sensor (Bash, Edit, Write, MultiEdit)
  • on_file_change — glob patterns for file paths (PostToolUse, checks content)
  • actionblock (exit 2, Claude sees the message) or warn (logged, shown in report)
  • patterns — regex list matched against the Bash command or file path
  • unless — if any of these match, the sensor does not trigger
  • block_if_contains — regex matched against file content after an edit
  • require_contains — regex that must be present in file content (absence triggers the sensor)
  • message — shown to Claude when blocked or warned

Install (2 minutes, I promise)

Step 1 — the CLI

pipx install agent-sensorium
# or: pip install agent-sensorium

Step 2 — wire up Claude Code

cd my-project
sensorium init claude-code            # this project only
sensorium init claude-code --global   # every project

Writes .claude/settings.json with the absolute path to the sensorium binary. Claude Code picks it up immediately.

Step 3 (optional) — your own rules

mkdir -p .sensorium
# add .sensorium/sensors.yaml, see above

That's it. Claude Code works exactly as before — Sensorium just quietly rides along on every tool call.

Receipts, not vibes

sensorium report          # show session audit log
sensorium report --clear  # show and reset
=== Sensorium Audit Report ===

Tools used:         12
Sensors triggered:  3
Blocked actions:    2
File violations:    1

--- Blocked Actions ---
  2026-07-03T10:14:22  [filesystem.wipe]  Bash: rm -rf /tmp/old-data
  2026-07-03T10:17:05  [git.destructive]  Bash: git reset --hard origin/main

--- File Sensor Violations ---
  2026-07-03T10:19:11  [full_object_overwrite]  src/apply.js
    required: ['delta|changed_fields', 'precondition|current_hash']

--- Sensor Activity ---
  external_api.write: 3x
  filesystem.wipe: 2x

Every block, every warning, every proof registered — append-only, in .sensorium/state.jsonl. Nothing silently disappears.

The architecture, for the nerds (me too)

Sensorium follows the State-Delta pattern:

world change (Claude tool use)
  → typed delta/event (PreToolUse / PostToolUse)
  → matching sensors wake up
  → each sensor selects the narrow context it needs
  → deterministic policy evaluates
  → allow / block / warn
  → ledger records the fact

No broad rescans. No LLM judge. No polling. A sensor declares exactly what it listens for and what invariant it protects — that's the whole contract.

.sensorium/
  state.jsonl      # append-only event ledger
  snapshots/       # file snapshots before edits (content-addressed)
  sensors.yaml     # your project-specific rules (optional)

The honest part

This is regex and policy, not a sandbox. It catches the direct, literal case — an agent typing a dangerous command in the open. It is not a defense against something actively trying to route around it (a script file, an interpreter, a clever quote). Self-reported proofs are exactly that — self-reported. Treat it as a seatbelt, not a cage, and you'll use it correctly.

Hook format reference

sensorium init claude-code writes this to .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Edit|Write|MultiEdit",
        "hooks": [{ "type": "command", "command": "/path/to/sensorium hook pretool" }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash|Edit|Write|MultiEdit",
        "hooks": [{ "type": "command", "command": "/path/to/sensorium hook posttool" }]
      }
    ],
    "Stop": [
      {
        "hooks": [{ "type": "command", "command": "/path/to/sensorium hook stop" }]
      }
    ]
  }
}

Exit code 2 from pretool blocks the tool. Exit 0 allows it.

License

AGPL-3.0. See LICENSE.


The incidents this is built around

Not hypotheticals — these happened, and each one maps directly to a sensor above:

If you're shipping agents with real filesystem/shell/API access and you're not doing this yet — you're one confidently-wrong tool call away from a bad afternoon.

Would love thoughts from anyone else building guardrails for agentic coding tools. 🙏

#AIagents #ClaudeCode #DeveloperTools #AgentSafety #OpenSource

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sensorium-0.1.0.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_sensorium-0.1.0-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_sensorium-0.1.0.tar.gz.

File metadata

  • Download URL: agent_sensorium-0.1.0.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for agent_sensorium-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9b0bc18be75b557fca7e6d77ec28876af668ef16a363262992623b89b3c9bae8
MD5 5b9304602e7b1ed64321cda24a74d165
BLAKE2b-256 cae7d0368ab24c59407e7af2279ba6fa542f02c17760ffee3ceae8202560c073

See more details on using hashes here.

File details

Details for the file agent_sensorium-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_sensorium-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c39a7f5da84890a73f87c055720a09fcde37528f54c711035b104c1b9f7fea6
MD5 b133817756d0426617b53b5cdff8774b
BLAKE2b-256 6a105abb1c8e9479fe60cf556ce548d9678629ae2c62025a70c3a4fee2400703

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page