Observable trajectory infrastructure — trading domain reference implementation
Project description
Executable World Models
Agent runtimes that execute, evaluate, and learn — on AWS
This repository accompanies the Executable World Models essay series published on Substack. It is the companion to the Beyond Tokens series, which established why world models matter. This series builds them.
What this repository builds
Most agent frameworks focus on what agents can do — tools, skills, orchestration. This project focuses on something different: what it takes for agent behavior to be observable, reproducible, and improvable over time.
The architecture is built in layers, each release adding one:
- Execution — a deterministic agent runtime deployed on AWS, executing decisions under explicit budget constraints and emitting structured artifacts for every run
- Evaluation — a structural validity layer that verifies artifact integrity before runs are treated as evidence
- Environments — a stateful world that evolves step by step, so agents interact with something that changes rather than just querying APIs
- Learning — an evidence policy that converts validated experiment results into decision guidance for future runs, without retraining the model
The result is a system that produces trajectories, not just outputs. Trajectories can be replayed, validated, compared across experiments, and used as the dataset from which intelligent systems can begin to improve.
Releases
Each release adds one layer to the architecture. The system compounds — it does not pivot.
| Release | Layer added | Essay |
|---|---|---|
v0.8.1-Agent-Runtime |
Execution — agent runtime deployed on AWS | From Theory to Runtime |
v0.8.3 |
Evaluation + environments — structural validity and deterministic world | Evaluation Is a Primitive, Not a Report · Tools Return Results. Environments Change the World. |
v0.8.5 |
Evidence policy — learning without retraining | Learning Without Retraining |
v0.8.5.1 |
Policy-guided agent — evidence feeding future decisions | Learning Without Retraining |
Essays
This series follows directly from Beyond Tokens. That series made the case for why world models matter. This series runs that argument as infrastructure.
| Essay | What it covers |
|---|---|
| From Theory to Runtime | The Agent Runtime goes live on AWS — execution, artifacts, persistence, telemetry |
| Evaluation Is a Primitive, Not a Report | Structural validation that turns runs into trusted evidence |
| Tools Return Results. Environments Change the World. | Why environments are the missing layer in most agent architectures |
| Learning Without Retraining | How agent systems improve decisions without changing the model |
| The Architecture of Intelligent Systems | What both series, taken together, mean for how intelligent systems are built |
Recommended entry point: start with From Theory to Runtime, which introduces the runtime and links directly to the code. Read the earlier Beyond Tokens series for the architectural argument that precedes it.
Architecture
The system is a layered experimental stack. Each layer makes agent behavior more observable and improvable.
agents
↓
constraints — budget: steps · tool calls · model calls · memory
↓
artifacts — decision.json · trajectory.json · deltas.json
↓
evaluation — structural validity · integrity checks
↓
experiments — aggregate across runs · integrity rate · success rate
↓
environments — stateful · deterministic · step-by-step
↓
evidence policy — patterns → decision guidance · no retraining required
The upper layers generate behavior. The lower layers make that behavior observable, trustworthy, and learnable from.
The learning loop
Version v0.8.5 completes the learning loop. The architecture now runs end to end:
environment → trajectories → artifacts → evaluation
→ experiments → evidence dataset → evidence policy → future decisions
This is not reinforcement learning. No model weights are updated. No gradient descent occurs. What changes is the decision architecture — past experiment evidence informs future choices, without touching the model.
Setup
make setup
make lint
pytest
Local demo
python3 scripts/demo_learning_loop.py
What you should see
- Agent runs interact with the
MarketPathEnvironmentstep by step - Structured artifacts are written for each run —
decision.json,trajectory.json,deltas.json - Evaluation verifies structural integrity — valid runs proceed, invalid runs are excluded
- Experiments aggregate results across runs
- A learning dataset is exported from validated trajectories
Evidence policy demo (v0.8.5)
# Export learning dataset from validated experiments
python3 scripts/export_learning_dataset.py
# Run the learner stub to produce a learning report
python3 scripts/run_learning_stub.py
# Build the evidence policy from the learning report
python3 scripts/build_evidence_policy.py \
--learning-report outputs/learning/demo_learning_report.json \
--output outputs/learning/evidence_policy.json
# Run the policy feedback loop demo
python3 scripts/demo_policy_feedback_loop.py
Policy-guided agent demo (v0.8.5.1)
# Run the policy-guided agent
python3 scripts/demo_policy_guided_trading_agent.py
# Run the full end-to-end learning loop
python3 scripts/demo_end_to_end_learning_loop.py
What you should see
- Agent loads an evidence policy and consults it for each decision
- Symbol-level preferences take priority, then step-level preferences, then the default action
- Each decision includes an explanation of which policy source was used
- The complete loop from experiments to decisions runs end to end
AWS deployment
make deploy-agentcore-loop
Verify health:
curl https://<your-api-gateway-url>/health
Run integration tests:
pytest tests/integration
Repository structure
services/core/environment/ world environments — MarketPathEnvironment
services/core/eval/ structural evaluation layer
services/core/learning/ evidence policy and learning scaffold
services/cli/ operational CLI
scripts/ demos, export tools, and policy builders
tests/ unit and integration tests
infra/cdk/ AWS infrastructure — API Gateway, Lambda, DynamoDB, S3
docs/ architecture diagrams
outputs/learning/ experiment datasets and policy outputs
How to evaluate this repository in 10 minutes
- Run
python3 scripts/demo_learning_loop.pyand observe that every run produces structured artifacts - Check
outputs/learning/— trajectories are exported as a clean dataset - Run
python3 scripts/demo_policy_guided_trading_agent.pyand observe decisions being guided by prior evidence - Open any
decision.jsonartifact — the decision, trajectory, and state deltas are all explicit and inspectable - The model does not change. The system improves through architecture.
Project status
Current milestone: v0.8.5.1 — Policy-Guided Agent
The learning loop is complete. Experiments produce evidence. Evidence becomes policy. Policy informs future decisions.
The next step is replacing the deterministic MarketPathEnvironment with a learned world model — at which point the environment itself becomes a predictive system rather than a replay. The experimental architecture remains unchanged.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ewm_core-0.1.0.tar.gz.
File metadata
- Download URL: ewm_core-0.1.0.tar.gz
- Upload date:
- Size: 56.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee72d2a78d9b1d2b299530fdbdabbb1fc196651e9ce1b7e1d1c43da8fc903ec6
|
|
| MD5 |
87a53aa41b4b45a4222d4ba53ca3ff09
|
|
| BLAKE2b-256 |
914c1d4e3bf07ca5835b07fed1a7d58e1bc87f1db34fa327e7ec18353a0b3f31
|
File details
Details for the file ewm_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ewm_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 75.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b914cad0a4bfc2c9bfd5556cf4cb3d8145e251cd5f8cc3884873efef4bf895c
|
|
| MD5 |
bcb37ee4510058146599315663e6216c
|
|
| BLAKE2b-256 |
f4da38a01496c8dc442838a8a892cd8f767a896cf26875e32f63423b41abef4c
|