Skip to main content

Smart-home sandbox for AI agents. Bundles Home Assistant + MQTT + a fully simulated home + MCP server + a built-in CLI agent. Runs on a laptop or a Raspberry Pi.

Project description

sandcastle-sim

build pypi license python

Sandcastle Sim is a sandbox for smart-home AI agents. Real Home Assistant (HA) and Mosquitto run in Docker; the devices are simulated and publish via standard MQTT discovery. From HA's perspective there's no difference between a simulated bulb and a real one, so an agent that works here works against a real home unchanged.

For developers building smart-home agents. One command brings up the full stack. A built-in CLI agent gets you to a working demo in minutes, then drop in your own when you're ready to iterate on prompts, UX, and edge cases.

architecture

Contents

Install

Prerequisites

  • Docker with Compose v2
  • Python >= 3.10
  • Ollama for the built-in CLI agent. Optional if you're connecting your own MCP agent.
  • Tested on Mac (Apple Silicon), Linux, and Raspberry Pi 4/5. Windows not yet tested.

Setup

Create and activate a virtual environment so the install stays isolated from your system Python:

python -m venv .venv
source .venv/bin/activate

Then install:

pip install sandcastle-sim

Planning to make code changes? Install editable from a checkout instead (pip install -e .). See CONTRIBUTING.md for the full dev setup.

Quickstart

Start the stack

sandcastle-sim start

When the castle banner prints in the terminal, the stack is up. Open http://localhost:8766 and you should see the floor plan with every device laid out across six rooms. Click any device to flip it on or off, dim a light, or open a blind. That's the simulated home.

Drive it with natural language

In a separate terminal, pull the model and start Ollama:

ollama pull gemma4:e4b
ollama serve

Then back in your first terminal:

sandcastle-sim chat

A chat panel shows up listing the model and the available tools. For your first prompt, try:

set up welcome guest

set up welcome guest

From there, riff off the tool list the chat panel shows.

Run sandcastle-sim --help for the full command list.

Using an AI coding agent (Claude Code, Codex, Copilot)? Read AGENTS.md first.

Eval suite

AI agents aren't deterministic. The same prompt can produce different outputs as you change the model, the system prompt, or the tool config. Small changes break things in non-obvious ways. The eval suite is how you catch that, and a quick way to see how performance looks on your hardware.

Baseline (per host)

End-to-end latency on the bundled quick.yaml suite against the live stack (HA + MQTT + simulator + MCP) with gemma4:e4b (~4 B params, q4_K_M).

Avg/case is the full round trip for one prompt — the model reads it, decides which tool to call, the MCP server dispatches the call, Home Assistant executes it and updates state, and the model writes its reply back. The eval pre-warms the model with a single token so per-case timings reflect steady-state cost only — the cold model-load you see once at the start of sandcastle-sim chat is excluded.

Host Pass Avg/case Slowest
DGX Spark (NVIDIA GB10, 128 GB unified) 5/5 3.7 s state_query 9.0 s
MacBook Pro M3 Max (36 GB unified) 5/5 3.7 s state_query 6.9 s
MacBook Pro M3 Pro (18 GB unified) 5/5 6.3 s state_query 15.7 s

Numbers are median of 3 repeats per case (--repeat 3, the default) so single-shot noise on bandwidth-bound laptops doesn't show up as performance changes.

The five cases in quick.yaml:

  • light_off — "turn off the kitchen counter light"
  • scene_named — "set up movie night"
  • lock_door — "lock the front door"
  • climate_setpoint — "set the temperature to 22"
  • state_query — "what lights are on right now?" (the heaviest case — the agent has to list devices, then answer)

Try it

1. Save a baseline snapshot of how the agent behaves right now:

sandcastle-sim eval --save-baseline

2. First go — see the diff workflow without writing any code. Toggle off the agent's tool-routing optimisation for one run:

sandcastle-sim eval --no-routing --diff

Every case lands a bit slower (no failures), and the diff surfaces clean latency regressions against the baseline you just saved. The flag scopes to that one command; the next eval reverts to defaults automatically.

3. Normal use — after you change your agent. Make any change, then:

sandcastle-sim eval --diff

The report leads with cases that used to pass and now fail. Cases that got noticeably slower show up too. Exit code is non-zero if anything regressed, so a coding agent running this in a loop can tell when its own changes broke something.

evals/quick.yaml is the starter suite. Write your own to match your agent's acceptance bar.

Next steps

Extend the simulator

Add devices, tools, and scenes to make the sandbox match your target home. See docs/extending-the-simulator.md.

Connect cloud models

Coming soon.

Read more


Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandcastle_sim-0.1.3.tar.gz (117.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sandcastle_sim-0.1.3-py3-none-any.whl (120.6 kB view details)

Uploaded Python 3

File details

Details for the file sandcastle_sim-0.1.3.tar.gz.

File metadata

  • Download URL: sandcastle_sim-0.1.3.tar.gz
  • Upload date:
  • Size: 117.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sandcastle_sim-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2c415caba6071a9d058415ee06507e459ff114457bbc44e44c1c5a4349797edf
MD5 c2e52b95db56fdfa3f79a57f8e2c11a2
BLAKE2b-256 988a4b729f1905a2b37255665ea48f81427f2552f5174a8cbead100d7ab5d4eb

See more details on using hashes here.

File details

Details for the file sandcastle_sim-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: sandcastle_sim-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 120.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sandcastle_sim-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 60c9d64904b51446eefe277d4e1c975b8877690fc731f451e2259fee0d14e8b7
MD5 b1dbb5f04fcad48c2a5e8d9b44ce4039
BLAKE2b-256 dce9c509683f8b83a02677642303949262406f56e99f45942a0067e6512624ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page