VEI — programmable replica of an enterprise software stack for agent testing, RL training, and operational simulation
Project description
VEI
VEI is a programmable replica of an entire company's operational software stack. You give it a company description — or connect it to real Slack, Gmail, Jira, and Teams data — and it builds a fully functioning simulated copy with working channels, email threads, ticket queues, CRM pipelines, document stores, and identity systems that an agent or a human can operate inside.
Pick a company, pick a crisis, define what success looks like, then play moves or let an agent play them — every tool, every person, every process reacts as one connected system.
Full overview: what this is, who it's for, and how to connect your own data →
What VEI Simulates
VEI simulates a complete enterprise environment — every software system, every person, every process — as one deterministic, branchable world. An agent (or a human) discovers what systems exist, inspects state, takes actions that ripple across all tools simultaneously, and is evaluated against business constraints.
What is simulated:
- Software surfaces — Slack, Email, Browser, Docs, Spreadsheet, Tickets, CRM, ERP, Okta-style identity, ServiceDesk, Google Admin, SIEM, Datadog, PagerDuty, feature flags, HRIS, and Jira-style issues. One move in one system can trigger visible changes across all the others.
- Vertical company worlds — Each vertical is a complete company with realistic seed data across all surfaces:
- Pinnacle Analytics (B2B SaaS) — $480K enterprise renewal at risk, broken integration, departed champion, competitor circling
- Harbor Point Management (Real Estate) — Flagship tenant opening with lease, vendor, and property-readiness pressure
- Northstar Growth (Marketing Agency) — Campaign launch with approval, pacing, and reporting risk
- Atlas Storage Systems (Storage/Logistics) — Strategic customer quote with fragmented capacity
- Time and state — Virtual time, scheduled events, snapshots, branches, replay, and restore
- Policies and outcomes — Success predicates, forbidden states, policy invariants, observation boundaries, deadlines, and contract-graded outcomes
- Long-horizon work — Multi-step tasks that cross systems, have hidden state, require follow-through, and can fail midway
How the simulation works:
- A
BlueprintAssetdeclares the company: its org structure, tool data (Slack channels, email threads, tickets, docs, CRM deals), and domain objects (leases, campaigns, capacity pools, etc.) - The blueprint compiles into a
WorldSession— a deterministic kernel that owns all state, event queues, and tool dispatch - A
Scenariooverlays pressure on the world (a crisis, a deadline, a fault injection) - A
Contractdefines what success looks like (predicates, invariants, reward terms) - Actions flow through MCP tools, resolve to capability-graph mutations, and produce observable side effects across every surface simultaneously
- The entire run is recorded as an append-only event spine — replayable, branchable, and gradeable
Each world pack supports multiple scenario variants and contract variants, so the same company can be placed under different pressures with different success criteria. The same packs also ship as playable missions for human step-through.
Core Primitives
VEI now exposes one coherent product shape:
Blueprint: typed composition of scenario, facades, workflow, and contractBlueprintAsset: authored blueprint root that declares a scenario template, capability-graph or environment seed, requested facades, workflow, and metadataCompiledBlueprint: compiled blueprint with resolved facades, state roots, workflow defaults, contract defaults, and run defaultsGroundingBundle: typed imported org/policy/incident input that compiles into aBlueprintAssetImportPackage: raw CSV/JSON enterprise export pack plus mapping profiles, redaction state, and provenance anchorsWorkspace: file-backed environment root that stores blueprint, contracts, scenarios, imports, runs, and artifactsScenario: seeded enterprise world and difficulty/tool manifestFacade: typed enterprise surface grouped by capability domainContract: success predicates, forbidden predicates, observation boundary, policy invariants, reward terms, and intervention rulesRun: workflow, benchmark, demo, and suite executions over the same world kernelSnapshot: branchable world-state checkpoint with replay and receipts
The older per-app router twins are still used, but they are now wrapped as a typed facade catalog rather than presented as the product ontology by themselves.
VEI is semantic-first today. VM-backed desktop or OS-level facades can come later as plugins, but the current engine is intentionally focused on compiling organization state and policies into a deterministic world before adding heavier substrates.
License
This repository is licensed under the Business Source License 1.1 in LICENSE.
- Additional Use Grant:
None - Change Date:
2030-03-10 - Change License:
GPL-2.0-or-later
Quick Start
Install
pip install -e ".[llm,sse,ui]"
Configure .env
OPENAI_API_KEY=sk-your-key
VEI_SEED=42042
VEI_ARTIFACTS_DIR=./_vei_out
Verify the repo
make setup
make check
make test
make llm-live
vei smoke --transport stdio --timeout-s 30
make llm-live auto-loads .env when present and writes summary.json next to the other live-run artifacts under _vei_out/llm_live/latest.
Run a live episode
vei llm-test run \
--provider openai \
--model gpt-5 \
--task "Research price, get Slack approval under budget, and email vendor for quote."
Workspace and UI flow
vei project init --root _vei_out/workspaces/acquired_cutover --example acquired_user_cutover
vei contract validate --root _vei_out/workspaces/acquired_cutover
vei run start --root _vei_out/workspaces/acquired_cutover --runner workflow
vei ui serve --root _vei_out/workspaces/acquired_cutover
Or equivalently:
vei ui serve --root _vei_out/workspaces/acquired_cutover
The unified root CLI exposes the same lifecycle:
vei project show --root _vei_out/workspaces/acquired_cutover
vei scenario preview --root _vei_out/workspaces/acquired_cutover
vei inspect events --root _vei_out/workspaces/acquired_cutover
vei inspect graphs --root _vei_out/workspaces/acquired_cutover --domain identity_graph
The vertical demos now support the same company world under multiple futures and objective functions:
vei project init --root _vei_out/workspaces/harbor_point --vertical real_estate_management
vei scenario variants --root _vei_out/workspaces/harbor_point
vei scenario activate --root _vei_out/workspaces/harbor_point --variant vendor_no_show
vei contract variants --root _vei_out/workspaces/harbor_point
vei contract activate --root _vei_out/workspaces/harbor_point --variant safety_over_speed
vei run start --root _vei_out/workspaces/harbor_point --runner workflow
vei ui serve --root _vei_out/workspaces/harbor_point
That is the cleanest proof of the kernel thesis: the base company world stays fixed while VEI swaps the problem setup and success criteria on top of the same runtime, event spine, contract engine, and playback UI.
For the presentation path, VEI now ships a narrative-first Studio showcase:
vei showcase story \
--root _vei_out/vertical_showcase \
--run-id story_presentation \
--vertical real_estate_management \
--scenario-variant vendor_no_show \
--contract-variant safety_over_speed
vei ui serve --root _vei_out/vertical_showcase/story_presentation/real_estate_management
That path writes:
story_manifest.jsonstory_overview.mdexports_preview.jsonpresentation_manifest.jsonpresentation_guide.md
The point is product legibility: VEI now presents the demo as Presentation → Company → Situation → Objective → Run → Branch → Outcome → Exports, while the underlying kernel stays the same. The new presentation artifacts give you a clean live-demo flow on top of the same Studio workspace.
For the publishable local-product path, VEI now ships a mission-driven playable mode:
vei studio play \
--root _vei_out/playable/harbor_point \
--world real_estate_management \
--mission tenant_opening_conflict
That command prepares the world, activates the mission and objective, records the baseline/comparison context, generates a twin-fidelity report, and serves Studio in Mission Mode. If you only want the bundle on disk, add --no-serve.
The default Studio front door is now the Living Company View. Instead of opening on a debug dashboard, it opens on a compact software wall with Slack, email, tickets, docs, approvals, and the vertical business system side by side. The seeded worlds are intentionally denser now, so each company feels like a real operating business before you even play a move, and visible tool panels update when moves land.
To build the wider local playable release:
vei showcase playable \
--root _vei_out/playable_showcase \
--run-id playable_release
That bundle writes:
fidelity_report.jsonplayable_manifest.jsonplayable_overview.md
The new product-facing helpers are:
vei inspect fidelity --root _vei_out/playable/harbor_point
vei export mission-run --root _vei_out/playable/harbor_point --run-id human_play_... --format rl
Customer-shaped agent twins
VEI can now turn captured company context into a customer-shaped twin and expose provider-style routes that an external agent can talk to directly.
Build a twin from a saved context snapshot:
vei twin build \
--root _vei_out/customer_twins/acme_cloud \
--snapshot _vei_out/context/acme_snapshot.json \
--organization-domain acme.ai
Serve the compatibility gateway:
vei twin serve \
--root _vei_out/customer_twins/acme_cloud \
--host 127.0.0.1 \
--port 3020
That workspace keeps the normal VEI run history, surfaces, scoring, and replay, while the gateway exposes provider-shaped routes for:
- Slack-style chat
- Jira-style issues
- Microsoft Graph-style mail and calendar
- Salesforce-style CRM
The fastest way to inspect what was built is:
vei twin status --root _vei_out/customer_twins/acme_cloud
Pilot stack
VEI also ships a higher-level pilot flow for local agent demos. It starts the customer twin gateway, Studio, and a separate Pilot Console sidecar, then writes a launch manifest and short handoff guide for the person running the exercise.
vei pilot up --root _vei_out/pilots/pinnacle
vei pilot status --root _vei_out/pilots/pinnacle
That flow writes:
pilot_manifest.jsonpilot_guide.mdpilot_runtime.json
The Pilot Console lives beside Studio on the same UI server and gives the operator one place to check launch details, copy connection snippets, follow external-agent activity, and reset or finalize the run.
You can also use the bundled quick-start client:
python examples/pilot_client.py \
--base-url http://127.0.0.1:3020 \
--token YOUR_PILOT_TOKEN \
--post-message "Customer-safe update is ready for review."
When you are done:
vei pilot down --root _vei_out/pilots/pinnacle
Grounded import flow
VEI can now ingest realistic offline enterprise export packs and turn them into a runnable workspace. The import path is:
raw CSV/JSON exports -> import package -> review/override -> normalized grounding bundle -> compiled workspace
Canonical fixture demo:
If you are running from a source checkout, the bundled fixture lives under vei/imports/fixtures/. In an installed environment, resolve its packaged path with python -c "from vei.imports.api import get_import_package_example_path; print(get_import_package_example_path('macrocompute_identity_export'))".
cp -R vei/imports/fixtures/macrocompute_identity_export _vei_out/import_packages/macrocompute_identity_export
vei project validate-import --package _vei_out/import_packages/macrocompute_identity_export
vei project review-import --package _vei_out/import_packages/macrocompute_identity_export
vei project scaffold-overrides --package _vei_out/import_packages/macrocompute_identity_export --source-id okta_users
vei project normalize --package _vei_out/import_packages/macrocompute_identity_export
vei project import --root _vei_out/workspaces/macrocompute_import --package _vei_out/import_packages/macrocompute_identity_export
vei scenario generate --root _vei_out/workspaces/macrocompute_import
vei scenario activate --root _vei_out/workspaces/macrocompute_import --scenario-name oversharing_remediation --bootstrap-contract
vei run start --root _vei_out/workspaces/macrocompute_import --runner workflow --scenario-name oversharing_remediation
vei inspect provenance --root _vei_out/workspaces/macrocompute_import --object-ref drive_share:DOC-ACQ-1
vei ui serve --root _vei_out/workspaces/macrocompute_import
If you want the shortest end-to-end grounded identity flow, VEI now ships a single command that prepares the workspace, generates/activates the right scenario, bootstraps the contract, and can launch the baseline plus scripted comparison runs:
vei project identity-demo --root _vei_out/workspaces/identity_demo --overwrite
vei ui serve --root _vei_out/workspaces/identity_demo
Live source sync uses the same persisted import-package model. For the first connector-backed path, point VEI at a read-only Okta config JSON:
{
"base_url": "https://your-org.okta.com",
"token_env": "OKTA_API_TOKEN",
"organization_name": "Your Organization",
"organization_domain": "example.com"
}
Then sync it into an existing workspace:
vei project sync-source --root _vei_out/workspaces/macrocompute_import --connector okta --config _vei_out/okta.json
vei project review-import --root _vei_out/workspaces/macrocompute_import
vei project compile --root _vei_out/workspaces/macrocompute_import
The import UI now shows:
- package/source summary
- connected source registry and sync history
- mapping diagnostics
- identity reconciliation across imported users, employees, managers, and share principals
- suggested override locations and applied source overrides
- generated scenario candidates
- imported vs derived vs simulated counts
- contract rule provenance, including which rules were imported vs inferred
- active generated-scenario promotion into the workspace run path
- provenance drilldown from selected run events
What You Get
- Deterministic simulator with replayable traces
- Stable world-kernel API with snapshot, branch, restore, replay, inject, and event inspection
- File-backed workspaces that keep blueprint assets, contracts, scenarios, runs, and artifacts together
- Typed blueprint and facade catalog over the existing enterprise twins
- Blueprint compiler with explicit facade plugins and authored
GroundingBundle -> BlueprintAsset -> CompiledBlueprintflow - Environment-builder path that can compile typed capability graphs, policies, and workflow seeds into a runnable world session
- Grounded import pipeline that can validate file-based identity exports, normalize them into a
GroundingBundle, generate scenario candidates, bootstrap contracts, and preserve provenance/redaction artifacts inside a workspace - Multi-source identity reconciliation that explains how Okta-style users, HRIS employees, manager references, and share/request principals were resolved, left unmatched, or marked external
- Connector-backed import pipeline that can sync a live read-only Okta snapshot into the same canonical
ImportPackage -> GroundingBundle -> Workspaceladder used by file exports - Runtime capability-graph layer that lets world sessions and snapshots expose shared domain graphs such as identity, docs, work, comms, and revenue
- Graph-native planning and mutation layer that lets agents ask for suggested next actions and apply graph actions without dropping down to raw app tools first
- Graph-native workflow execution, so benchmark/playbook steps can compile to
vei.graph_actioninstead of only raw app-shaped tool calls - Vertical world packs for B2B SaaS, real estate management, digital marketing agencies, and storage-solutions companies with built-in scenario variants, contract variants, and curated “same world, many futures” demo paths
- Context capture layer that pulls live enterprise data from Slack, Jira, Google Workspace, and Okta into a structured
ContextSnapshot, then hydrates aBlueprintAssetfrom it - Synthesis layer that extracts runbooks, training data (conversations, trajectories, demonstrations), and agent configurations from completed world runs
- Agent-orientation layer that lets sessions and snapshots expose agent-facing summaries of visible surfaces, active policies, key objects, and suggested next questions
- Enterprise twins for Slack, Mail, Browser, Docs, Spreadsheet, Tickets, DB, ERP/CRM, Okta-style identity, ServiceDesk, Google Admin, SIEM, Datadog, PagerDuty, feature flags, HRIS, and Jira-style issue flows
- Scenario compilation, dataset rollout, BC training, benchmark execution, and release packaging
- Reusable benchmark families for security containment, enterprise onboarding/migration, and revenue incident response
- Curated complex-example showcase bundles for security incidents, acquired-user cutovers, and revenue-critical mixed-stack mitigations
- Local playback UI for completed and in-flight workspace runs, including timeline, orientation, capability graphs, snapshots, diffs, and contract outcome panels
- Canonical append-only run event stream that drives playback,
vei inspect events, receipts, contract status, and snapshot markers across workflow, scripted, BC, and LLM runs - Variant-aware workspace activation so previews, run manifests, showcase bundles, and the UI all explain which scenario overlay and contract overlay are active on top of the base world
- VEI Studio narrative mode, so the same kernel can be shown as a world studio for enterprises with company briefings, situation/objective selection, branch/outcome explanation, and export previews for future RL/eval/agent-ops layers
- Mission-driven playable Studio mode, where the same kernel now acts like a work-game runtime with human moves, scorecards, branch points, and twin-fidelity checks
Architecture
Agent ──MCP──► VEI Router
└─ transport + tool dispatch
│
▼
WorldSession Kernel
├─ unified world state
├─ snapshots / branch / replay / inject
├─ actor state + receipts
└─ enterprise twins and control planes
Next Phase
The current execution-ready roadmap lives in docs/NEXT_PHASE_PLAN.md.
In one line: the next phase is about making vei.run the canonical execution spine and making VEI much stronger at turning messy enterprise exports into runnable, inspectable, contract-graded identity environments.
Use It As A Library
Install directly from GitHub:
pip install "git+https://github.com/strangeloopcanon/digital-enterprise-twin.git@main"
For the full product workflow, including the local UI and live LLM runs:
pip install -e ".[llm,sse,ui]"
SDK embedding:
from vei.sdk import create_session
session = create_session(seed=42042, scenario_name="multi_channel")
obs = session.observe()
page = session.call_tool("browser.read", {})
World-kernel embedding:
from vei.world.api import create_world_session, get_catalog_scenario
world = create_world_session(
seed=42042,
scenario=get_catalog_scenario("multi_channel"),
)
obs = world.observe()
snapshot = world.snapshot("before-run")
events = world.list_events()
Useful helpers:
- Scenario manifests:
list_scenario_manifest(),get_scenario_manifest(name) - Facade catalog:
list_facade_manifest_entries(),get_facade_manifest_entry(name) - Blueprint catalog:
list_blueprint_entries(),build_blueprint_asset_for_family_entry(name),build_blueprint_for_family_entry(name),compile_blueprint_entry(asset) - Environment builder:
list_blueprint_builder_examples_entries(),build_blueprint_asset_for_example_entry(name),create_world_session_from_blueprint_entry(asset) - Workspace lifecycle:
create_workspace_from_template_entry(...),import_workspace_entry(...),compile_workspace_entry(...),show_workspace_entry(...) - Import helpers:
list_import_package_example_entries(),validate_import_package_entry(path),review_import_package_entry(path),scaffold_mapping_override_entry(path, source_id=...),normalize_import_package_entry(path),load_workspace_import_review_entry(root),load_workspace_provenance_entry(root, object_ref) - Run lifecycle:
launch_workspace_run_entry(...),list_run_manifests_entry(...),get_run_orientation_entry(...),get_run_capability_graphs_entry(...) - Benchmark families:
list_benchmark_family_manifest_entries(),get_benchmark_family_manifest_entry(name) - Release packaging:
build_release_version(),export_release_dataset(...),export_release_benchmark(...),run_release_nightly(...)
Primary Commands
make setup
make check
make test
make llm-live
make deps-audit
make all
If you do not have LLM credentials:
VEI_LLM_LIVE_BYPASS=1 make llm-live
Supported CLI Surface
- Start here
vei project|contract|scenario|run|inspect|showcase|uivei ui servevei studio play(mission-driven playable mode)
- Context and synthesis
vei context capture|hydrate|diffvei synthesize runbook|training-set|agent-config
- Expert tools
vei worldvei blueprint bundle|bundles|asset|compile|show|observe|orient|examples|facadesvei visualize replay|flow|dashboard|export
- Evaluation and release
vei eval,vei eval-frontier,vei rollout,vei train,vei score,vei release
- Catalog/debug surfaces
vei scenarios list|manifest|dumpvei smoke,vei demo,vei det sample-workflow|compile-workflow|run-workflow|generate-corpus|filter-corpus
vei inspect graphs is now the broadest product/workspace graph surface. It can inspect identity_graph, doc_graph, work_graph, comm_graph, revenue_graph, ops_graph, obs_graph, and data_graph from a recorded run. vei world graphs remains the expert snapshot-level surface and currently focuses on comm_graph, doc_graph, work_graph, identity_graph, and revenue_graph. vei world orient and vei blueprint orient add the agent-facing layer on top: visible surfaces, active policy hints, key objects, and suggested next questions.
The product CLI also now supports built-in vertical demo worlds:
vei project init --root _vei_out/workspaces/pinnacle --vertical b2b_saas
vei project init --root _vei_out/workspaces/harbor_point --vertical real_estate_management
vei project init --root _vei_out/workspaces/northstar_growth --vertical digital_marketing_agency
vei project init --root _vei_out/workspaces/atlas_storage --vertical storage_solutions
Inside live MCP sessions, agents can now call the same discoverability surfaces directly with vei.orientation, vei.capability_graphs, vei.graph_plan, and vei.graph_action.
Graph-native agent ladder:
vei.orientation
-> what kind of world is this?
vei.capability_graphs
-> what shared domain state exists?
vei.graph_plan
-> what graph-native actions make sense next?
vei.graph_action
-> apply one of those actions through the real twins
The workflow layer now uses the same abstraction too: flagship onboarding and revenue/ops workflows execute graph-native steps internally and only resolve down to concrete twins at runtime.
Workspace And Playback UI
The default product-shaped loop is now:
vei project initorvei project importvei project compilewhen you want to refresh compiled artifacts after editing the workspace;init,import, andrun startalready compile for youvei contract validateandvei scenario previewvei run start --runner workflow|scripted|bc|llmvei inspect orient|graphs|events|snapshots|diff|receiptsvei ui serve
The local UI stays intentionally lightweight and Python-first. It opens one workspace, shows compiled scenario and contract context, launches runs with scenario/runner/provider/model/task/max-step controls, and renders a playback control room with animated channel lanes, run scorecards, capability-graph summaries, orientation cards, snapshot diffs, and raw developer drawers over the same canonical run artifacts.
Run playback is now driven by the canonical append-only event spine, so live and completed runs share the same source of truth for contract updates, snapshot markers, resolved tools, and graph-native intents like identity_graph.assign_application or doc_graph.restrict_drive_share.
The Studio front door is the Living Company view: Slack, email, tickets, docs, approvals, and the vertical business system displayed side by side as a software wall. Moves land visibly across all surfaces. The three-tab navigation (Company, Crisis, Outcome) keeps the audience focused while a developer toggle exposes the full engine underneath.
Imported workspaces add a grounded-intake layer on top of that same UI: source-package health, normalization diagnostics, scenario candidates, imported/derived/simulated object counts, and provenance drilldown from timeline events to raw-source lineage.
Benchmarking
Baseline run:
export VEI_ARTIFACTS_DIR=_vei_out/llmtest
VEI_SEED=42042 vei llm-test run \
--provider openai \
--model gpt-5 \
--max-steps 32 \
--task "Open product page, cite specs, post approval under $3200, email sales@macrocompute.example for a quote, wait for reply."
vei score --artifacts-dir _vei_out/llmtest --success-mode full
Kernel-backed benchmark run:
vei eval benchmark \
--runner scripted \
--scenario multi_channel \
--artifacts-root _vei_out/benchmark \
--run-id scripted_multi
Family-level benchmark run:
vei eval benchmark \
--runner workflow \
--family security_containment \
--artifacts-root _vei_out/benchmark \
--run-id security_workflow
Explicit workflow selection for a single scenario:
vei eval benchmark \
--runner workflow \
--scenario oauth_app_containment \
--workflow-name security_containment \
--workflow-variant internal_only_review \
--artifacts-root _vei_out/benchmark \
--run-id security_named_workflow
Scripted or LLM family runs stay on the same pipeline:
vei eval benchmark \
--runner scripted \
--family security_containment \
--artifacts-root _vei_out/benchmark \
--run-id security_family
Canonical family demo flow:
vei eval demo \
--family security_containment \
--artifacts-root _vei_out/demo \
--run-id security_demo
That command runs the deterministic family workflow baseline plus a comparison runner, writes leaderboard.md / leaderboard.csv / leaderboard.json, stores inspectable world state under _vei_out/demo/security_demo/state for follow-up vei world inspection, and records explicit contract.json artifacts for both the baseline and comparison paths. Contract evaluation now separates oracle state from agent-visible observation so hidden state can be graded without making the demo omniscient.
Complex-example showcase bundle:
vei eval showcase \
--artifacts-root _vei_out/showcase \
--run-id flagship_examples
That command runs three curated complex examples and writes one top-level showcase_overview.md bundle plus per-example demo artifacts:
oauth_incident_chain: Google Admin + SIEM + Jira + Docs + Slackacquired_seller_cutover: HRIS + Okta + Google Admin + Salesforce + Jira + Docs + Slackcheckout_revenue_flightdeck: Datadog + PagerDuty + feature flags + Spreadsheet + Docs + CRM + Tickets + Slack
It is the cleanest supported way to show that VEI can execute long-horizon, cross-surface enterprise tasks rather than only single-family demos.
Vertical world-pack showcase bundle:
vei showcase verticals \
--root _vei_out/vertical_showcase \
--run-id world_showcase
That command creates three separate workspace-backed companies, runs the deterministic workflow baseline plus a freer comparison runner for each, and writes one vertical_showcase_overview.md bundle alongside ready-to-open workspace roots:
b2b_saas: Pinnacle Analytics /enterprise_renewal_riskreal_estate_management: Harbor Point Management /tenant_opening_conflictdigital_marketing_agency: Northstar Growth /campaign_launch_guardrailstorage_solutions: Atlas Storage Systems /capacity_quote_commitment
The point of that showcase is not just three flashy demos. It is one proof repeated three times:
- the same world kernel compiles three different businesses into runnable environments
- the same event spine records every run, graph action, tool resolution, and snapshot
- the same contract engine judges deterministic baselines and freer agent runs
- the same playback UI makes the result inspectable
That is why VEI can later become an RL environment, a continuous eval system, and an AI-agent operations platform on top of the same kernel.
Flagship blueprint-driven revenue/ops demo:
vei blueprint asset \
--family revenue_incident_mitigation \
--workflow-variant revenue_ops_flightdeck
vei blueprint compile \
--family revenue_incident_mitigation \
--workflow-variant revenue_ops_flightdeck
vei eval demo \
--family revenue_incident_mitigation \
--artifacts-root _vei_out/demo \
--run-id revenue_ops_demo
That flow shows the full engine shape: authored BlueprintAsset, compiled blueprint, the deterministic workflow baseline, a freer comparison run, contract.json, and inspectable state/snapshot artifacts. The flagship revenue workflow now spans Spreadsheet, Docs, CRM, feature flags, Datadog, PagerDuty, Tickets, and Slack in one mixed-stack run.
Flagship environment-builder example for the identity/access-governance wedge:
vei blueprint examples
vei blueprint bundle \
--example acquired_user_cutover
vei blueprint asset \
--example acquired_user_cutover
vei blueprint compile \
--example acquired_user_cutover
vei blueprint observe \
--example acquired_user_cutover \
--focus slack
That flow shows the full builder ladder: raw grounding bundle, authored blueprint asset, compiled blueprint, and then a live world observation. The current built-in identity wedge compiles capability graphs for HRIS, Okta-style identity, Google Drive sharing state, Jira tracking, docs, Slack, and CRM handoff.
Agent-facing builder orientation:
vei blueprint orient \
--example acquired_user_cutover
That command renders the compiled blueprint, runtime capability graphs, and a concise orientation payload for the live world. It is the cleanest single command for showing what an LLM can discover about the environment before acting.
Canonical multi-family workflow suite:
vei eval suite \
--artifacts-root _vei_out/suite \
--run-id nightly_suite
That command runs each family's primary workflow variant and writes stable leaderboard.* artifacts plus suite_result.json, which makes it a good fit for CI or nightly publishing. Each family case also writes a contract.json artifact so the suite has an explicit contract layer, not just score files.
Frontier batch for one model:
vei eval-frontier run \
--runner llm \
--model gpt-5 \
--scenario-set reasoning \
--artifacts-root _vei_out/frontier_eval
Artifacts from batch evaluation include:
aggregate_results.json- per-scenario
benchmark_result.json - benchmark runs also write
blueprint_asset.json - benchmark runs also write
blueprint.json benchmark_summary.json- benchmark-family runs also write
contract.json - demo runs also write
leaderboard.md,leaderboard.csv,leaderboard.json, anddemo_result.json - suite runs also write
leaderboard.md,leaderboard.csv,leaderboard.json, andsuite_result.json - family-level dimension scores such as evidence preservation, blast radius, least privilege, oversharing avoidance, deadline compliance, revenue impact handling, artifact follow-through, comms correctness, and safe rollback
Render a report from any benchmark or frontier batch:
vei report generate \
--root _vei_out/frontier_eval/<run-id> \
--format markdown \
--output LEADERBOARD.md
Release Bundles
vei release dataset \
--input-path _vei_out/rollout.json \
--label rollout \
--version v20260310
vei release benchmark \
--benchmark-dir _vei_out/benchmark/scripted_multi \
--label scripted-benchmark \
--version v20260310
vei release nightly \
--release-root _vei_out/releases \
--workspace-root _vei_out/nightly \
--version nightly-20260310 \
--environments 5 \
--scenarios-per-environment 5 \
--rollout-episodes 2 \
--benchmark-scenario multi_channel
One-Command Demo
The fastest way to see VEI in action:
vei quickstart run
This creates a workspace from a built-in vertical, starts both the Studio UI
(:3011) and the Twin Gateway (:3012), runs a scripted baseline so you
immediately see events flowing, and prints connection details including mock
API URLs and an auth token. Press Ctrl-C to stop.
Options: --world digital_marketing_agency, --studio-port, --gateway-port,
--seed, --no-baseline.
Test Your Agent Against VEI
┌─────────────┐ HTTP / MCP ┌──────────────────┐ call_tool ┌──────────────┐
│ Your Agent │ ──────────────────► │ Twin Gateway │ ────────────────► │ WorldSession │
│ (any lang) │ ◄────────────────── │ :3012 │ ◄──────────────── │ Kernel │
└─────────────┘ Slack/Jira/SFDC └──────────────────┘ state + events └──────────────┘
shaped responses │ │
▼ ▼
Contract Evaluation Event Spine
(pass/fail/score) (events.jsonl)
- Start VEI:
vei quickstart run(orvei twin serve --root workspace) - Connect your agent to the mock API endpoints printed on startup — Slack, Jira, MS Graph, Salesforce — using the bearer token shown
- Your agent takes actions (sends Slack messages, transitions Jira tickets, queries Salesforce) and VEI responds with coherent, stateful results
- VEI evaluates against the contract (success predicates, forbidden predicates, policy invariants) and produces a scorecard
- Inspect results in the Studio UI timeline view, or read the run artifacts
(
events.jsonl, contract evaluation, snapshots)
For MCP-native agents, connect directly:
python -m vei.router --root workspace
Examples
examples/sdk_playground_min.pyexamples/mcp_client_stdio_min.pyexamples/rl_train.py
Docs
docs/OVERVIEW.md— What VEI is, who it's for, how to connect your data, and strategic contextdocs/ARCHITECTURE.md— Module structure and data flowdocs/BENCHMARKS.md— Benchmark families, difficulty tiers, and evaluation
Contributor Notes
bd state is local-only under .beads/ and should stay out of Git.
Workspace Hygiene
The repo source of truth is:
vei/tests/docs/tools/- top-level config such as
pyproject.toml,Makefile,README.md, and.agents.yml
Local-only generated folders such as _vei_out/, .artifacts/, .mypy_cache/, .pytest_cache/, .ruff_cache/, and vei.egg-info/ are disposable.
To prune local clutter while keeping the current canonical demo, latest live artifact, reusable datasets, your virtualenv, local bd state, and local Codex state:
make clean-workspace
archive_data/ is intentionally left alone by that target because it may contain local imported source data rather than regenerated outputs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyvei-0.2.0a1.tar.gz.
File metadata
- Download URL: pyvei-0.2.0a1.tar.gz
- Upload date:
- Size: 663.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75580086a9100e1d0de47b6e7ec0c4519da626615d91b3af7eef2454dc0a27e6
|
|
| MD5 |
8bd491fe145a8d20289a5beab358d702
|
|
| BLAKE2b-256 |
7fb173a57af2534ff3448c48be07f47142919cffd99ac0fb3ceb65bc23763027
|
File details
Details for the file pyvei-0.2.0a1-py3-none-any.whl.
File metadata
- Download URL: pyvei-0.2.0a1-py3-none-any.whl
- Upload date:
- Size: 656.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04b373068ffc75e4f5f6e7d8ad556d1ee30bac27a82561f3313578b74d5dc5ec
|
|
| MD5 |
14a403b3ab11f2995a62a0501df9a409
|
|
| BLAKE2b-256 |
59512e97ca89946f667d8e58d2b466bdaf7ef770bd5dc231a45d9fedef60a718
|