The control layer for AI coding agents.
Project description
OpenShard
The control layer for AI coding agents.
AI coding agents can write code, but engineering teams still need to understand what ran, what changed, what context was used, whether checks passed, what it cost and a way of proving it. OpenShard wraps AI coding agent runs with routing, review boundaries, checks, evidence, cost tracking, evals, feedback, and durable Shard receipts.
Agents write code. OpenShard controls the run and proves what happened.
Why OpenShard exists
AI coding agents are becoming good enough to work on real repos, infrastructure, and production-shaped systems.
That creates a new problem. Not “can the model write code?” but:
- Which model or workflow handled the task?
- What files did it inspect?
- What did it change?
- Did checks pass, fail, skip, or not run?
- What did the run cost?
- Was anything risky gated or reviewed?
- Is there a durable receipt of what happened?
OpenShard is built for the work around the agent: routing, verification, policy, evidence, cost awareness, and auditability. The valuable unit is not a single model call. It is a completed engineering task with evidence, checks, cost, and a receipt.
What OpenShard does
OpenShard is a CLI tool for controlling and recording AI coding agent runs.
It can:
- Run real repo tasks through a controlled execution path
- Route work across models and workflows where available
- Classify task risk
- Gate risky writes and commands
- Record model used, risk, checks, changed files, evidence, cost, and result
- Produce durable Shard receipts for every run
- Support read-only review flows that preserve
Changed 0 files - Provide workflow packs for repeatable engineering reviews
- Compare models and workflows through local evals
- Track feedback and session signals around runs
OpenShard is not trying to replace Claude Code, Codex, Cursor, OpenCode, or other coding agents.
Those tools do the coding work.
OpenShard sits around them as the control and audit layer.
Current developer loop
The current local developer loop is:
Ask -> Plan -> Run -> Inspect -> Feedback
Ask
Ask OpenShard product, model, command, and policy questions.
Plan
Generate a local execution plan. Plan Mode v1 is deterministic and local: it does not scan the repo, call a provider, or write files.
Run
Send a real repo task through OpenShard’s controlled execution path.
Inspect
Review the result, actions taken, evidence, checks, changed files, cost estimate, model choice, and Shard receipt.
Feedback
Record whether the result was accepted, partial, rejected, or needs more work.
Quick install
Recommended: pipx
pipx install openshard
openshard tui
Alternative: uv
uv tool install openshard
openshard tui
Upgrade later:
pipx upgrade openshard
See docs/install.md for upgrade instructions and notes.
Quick demo
Launch the TUI:
openshard tui
Inside the TUI:
/ask what models do you support?
/plan review this repo for production readiness
/packs
/pack production-iac-hardening
Run a real repo task:
Review and harden this deliberately flawed Terraform codebase. Assess it through security/compliance posture, 2am operability, and developer experience for a 5-10 person engineering team. Identify critical, high, and medium risks. Explain trade-offs. Do not apply changes directly without review.
Inspect the latest run:
/last more
Or from the shell:
openshard last --more
The --more view includes a PROOF SUMMARY block when OSN proof metadata is present, showing observation, progress, verification, loop, retry, and PR comment status.
Optional local follow-up commands after a run:
openshard reflect last # advisory reflection on the run (local, no model calls)
openshard pr comment # generate a GitHub-ready PR comment from the run
openshard pr comment --output pr-comment.md # write the PR comment to a file
Leave feedback:
openshard feedback --outcome accepted --reason "Useful review"
See the demo scripts for a recorded walkthrough:
Production IaC demo
The examples/production-infra-demo/ directory contains a fictional GCP workload called DocuVault — a sanitised demo scenario for OpenShard.
The infrastructure is intentionally production-shaped: networking, IAM, Cloud SQL, Cloud Run, storage, secrets, monitoring, and logging.
It is deliberately flawed to serve as the input for an infrastructure-as-code hardening review.
All names, project IDs, resource IDs, CIDRs, and accounts are fake and public-safe. No employer or customer details. Designed to show a serious IaC review, not a toy example.
See:
A typical production IaC review can show:
- Critical, high, and medium findings
- File-level evidence such as
iam.tf,secrets.tf,database.tf,network.tf, andstorage.tf - Verification output from tools like
terraform fmt,terraform validate, andtflintwhen available - A clear
Changed 0 filesreceipt for read-only reviews - Model selection and cost tracking
- A
/last moreview with the full Shard, findings, checks, evidence, and cost comparison
This is the core OpenShard use case: let AI help with serious engineering work, but keep the control, evidence, and receipt layer visible.
Shard receipts
A Shard is the durable receipt for an AI engineering run.
It can show:
- Task and agent
- Model used
- Strategy
- Risk level
- Context provenance
- Inspected files
- Changed and touched files
- Checks and their outcomes
- Findings, when structured findings exist
- Cost
- Actions timeline
- Result
OpenShard can also record feedback and infer session signals around a run.
openshard last --more # expanded receipt for the latest run
openshard last --full # full stored details
Every Shard receipt can power two local follow-up commands:
openshard reflect last # local advisory reflection on the run
openshard pr comment # generate a GitHub-ready PR comment
openshard pr comment --output pr-comment.md # write the PR comment to a file instead
Both commands are local and deterministic. They do not make additional model calls.
Raw developer content is not stored by default.
One run, end to end
A normal OpenShard run can capture:
- Task - the user request or workflow pack prompt.
- Routing - which model or workflow was selected.
- Risk - whether the task is low, medium, high, or requires stronger review.
- Execution - what the agent did during the run.
- Checks - verification results, including passed, failed, skipped, or not run.
- Evidence - files inspected, findings, and relevant source references.
- Changes - files changed, touched, or left untouched.
- Cost - estimated spend for the run.
- Receipt - a durable Shard record that can be inspected later.
The point is simple: every AI coding run should leave behind enough evidence for a developer or team to understand what happened.
How OpenShard is different
OpenShard is not a chatbot, IDE, or even a generic agent framework. It's the layer around agentic coding work.
| Layer | What it does |
|---|---|
| Coding agent | Generates code, edits files, answers task prompts |
| Model router | Chooses which model or workflow should handle the job |
| Verification layer | Runs checks and records whether they passed, failed, skipped, or were not run |
| Policy layer | Gates risky writes, commands, and high-risk work |
| Receipt layer | Records model, cost, evidence, checks, changed files, and result |
| Eval layer | Compares models and workflows by outcome, cost, speed, and safety |
OpenShard can work alongside tools like Claude Code, Codex, Cursor, OpenCode, LangChain, LangGraph, OpenRouter, and provider APIs.
The goal is not to replace every coding agent. The goal is to make AI coding work controllable, inspectable, and measurable.
Workflow packs
Workflow packs are pre-built prompts for repeatable engineering reviews.
openshard packs list
openshard packs show production-iac-hardening
openshard packs prompt production-iac-hardening
Built-in packs include:
repo-explanationproduction-iac-hardeningterraform-networking-reviewiam-security-reviewcicd-safety-reviewpowershell-automation-review
Workflow packs make common review patterns repeatable without forcing users to rewrite long prompts every time.
Command reference
Most developers should start with the TUI:
openshard tui # Launch the OpenShard terminal UI
Run tasks:
openshard run "Review this repo for risks" # Run a task through OpenShard from the shell
openshard run --workflow native "Fix this bug" # Run using the native workflow path
Inspect the latest run:
openshard last # Show the latest run summary
openshard last --more # Show the expanded Shard receipt
openshard last --full # Show full stored/debug details
Reflect and export:
openshard reflect last # Advisory reflection on the last run (local, no model calls)
openshard pr comment # Generate a GitHub-ready PR comment from the last run
openshard pr comment --output pr-comment.md # Write the PR comment to a file
Record feedback:
openshard feedback --outcome accepted # Mark the latest run as accepted
openshard feedback --outcome partial # Mark the latest run as partly useful
openshard feedback --outcome rejected # Mark the latest run as not useful
openshard feedback --outcome abandoned # Mark the latest run as abandoned
openshard feedback --outcome accepted --reason "kept as-is" # Optionally include a free-text reason
Infer local session signals:
openshard session infer # Infer local behavioural/session signals from run history
Workflow packs:
openshard packs list # List available workflow packs
openshard packs show production-iac-hardening # Show details for a workflow pack
openshard packs prompt production-iac-hardening # Print the pack prompt
Model registry and policy:
openshard models list # List registered models
openshard models role reasoning # Show reasoning-capable models
openshard models role cheap_control # Show low-cost/control models
openshard models mode ask # Show Ask Mode model policy
openshard models mode plan # Show Plan Mode model policy
Local evals:
openshard eval list # List eval suites
openshard eval validate --suite basic # Validate an eval suite
openshard eval run --suite basic # Run an eval suite
openshard eval report # Show latest eval report
openshard eval compare # Compare models by eval results
openshard eval stats # Show eval stats
Useful TUI commands:
/ask what models do you support? # Ask OpenShard product/model questions
/plan review this repo for production readiness # Generate a local plan without writing files
/packs # List workflow packs inside the TUI
/pack production-iac-hardening # Load a workflow pack inside the TUI
/last # Show the latest run
/last more # Show expanded run details
/last full # Show full debug/audit details
/feedback accepted # Record feedback for the latest run
/clear # Clear the output panel
/quit # Exit the TUI
After a run completes, the TUI shows command hints for openshard reflect last and openshard pr comment.
What works today
OpenShard is still alpha, but the core local loop is working.
Current features include:
- Local CLI and TUI (
openshard tui) - Ask Mode for local product/model/command Q&A
- Plan Mode v1 for deterministic local plans
- Controlled run path for real repo tasks
- OpenShard Native execution harness
- Task classification and risk handling
- Model registry and model policy inspection
- Routing across models/workflows where available
- Shard receipts with model, risk, files, checks, cost, evidence, and result
/last,/last more, and/last --full- Read-only review handling that preserves
Changed 0 files - Intent-specific review handling for Terraform/IaC, CI/CD, auth/security, tests, and docs/onboarding
- Workflow packs for repeatable engineering reviews
- Feedback signals
- Session signal inference
- Local run history
- Local eval harness
- Eval comparison by pass rate and cost-per-pass
- Cost comparison in
/last more - OSN proof pipeline with PROOF SUMMARY in
openshard last --more(Observation, Progress, Verification, Loop, Retry) openshard reflect lastfor local advisory run reflection (deterministic, no model calls)openshard pr commentfor local GitHub PR comment generation- TUI post-run command hints for reflect and pr comment
- Production-shaped Terraform demo
- 5,500+ passing tests and green CI
What is not built yet
OpenShard is early and intentionally local-first.
Not built yet:
- No hosted team platform yet
- No cloud sync yet
- No hosted dashboard for teams yet
- No IDE integration yet
- No Homebrew, winget, or one-line shell installer yet
- Plan Mode is not repo-aware yet
- Ask Mode and Plan Mode are local deterministic v1 flows
- Feedback advisory does not automatically change routing yet
- External harness adapters are experimental and not guaranteed
- Not a full Claude Code, Codex, or Cursor replacement
Local data and privacy
All run receipts, history, and proof metadata are stored locally in ~/.openshard/.
No run data, file contents, or task metadata is sent to OpenShard servers. There are none.
Model calls go directly to the provider you configure (Anthropic, OpenRouter, etc.) under your own API key.
openshard pr comment generates markdown locally and outputs to stdout or a local file. Nothing is posted to GitHub automatically.
Developer setup
Use this if you want to modify OpenShard locally or contribute code:
git clone https://github.com/MichaelObasa/openshard.git
cd openshard
pip install -e ".[dev]"
python -m pytest -q
python -m ruff check .
Advanced: evals
OpenShard includes a local eval harness for checking routing and workflow behaviour.
openshard eval list
openshard eval validate --suite basic
openshard eval run --suite basic
openshard eval report
openshard eval compare
openshard eval stats
The goal is not just to ask “which model is best?”
The better question is:
Which model or workflow succeeds most reliably for this type of task, at what cost, with what safety profile?
The eval system can track:
- Pass rate
- Verification outcomes
- Duration
- Token usage where available
- Cost where available
- Cost per passing run
- Unsafe file attempts
- Model ranking across eval runs
This is the foundation for smarter routing over time: routing based on actual task outcomes.
Current validation state
OpenShard is still early, but it is not just a prototype.
Current validation includes:
- 5,500+ passing tests
- Green CI
- Ruff-clean Python codebase
- Local CLI/TUI workflow
- Production-shaped Terraform demo
- Workflow packs for repeatable reviews
- Shard receipts for run history
- Eval tooling for model and workflow comparison
- Pre-launch usage from developers testing it on real work
The project is alpha, but the core loop is working:
Run the task -> inspect what happened -> verify the output -> create a receipt
Roadmap
Near-term roadmap:
- Public open-source launch
- More real-world developer testing
- Better repo-aware planning
- Stronger model/workflow ranking from real outcomes
- More workflow packs
- More repo analyzers for common stacks
- Cleaner setup and release packaging
- Hosted/team run history
- Team policies and shared approval gates
- Dashboards for cost, model usage, and verification outcomes
Longer-term, OpenShard should become the control plane teams use to manage AI engineering work.
Why open source?
Routing decisions should be inspectable.
If a tool decides which model touches security-sensitive code, developers should be able to see why.
OpenShard is open because trust, integrations, and routing policies improve when real users can inspect and extend the system.
Open source also keeps the local-first layer useful on its own. Hosted and team features can come later, but the core control layer should be understandable and inspectable.
Contributing
Contributions are welcome around:
- Routing policies and scoring logic
- Repo analyzers for new stacks
- Model profiles and capability data
- Evaluation datasets
- Provider integrations
- Workflow packs
- CLI/TUI UX improvements
- Documentation and examples
See CONTRIBUTING.md for details.
Security
If you find a security issue, please report it privately before opening a public issue.
See SECURITY.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openshard-0.1.2.tar.gz.
File metadata
- Download URL: openshard-0.1.2.tar.gz
- Upload date:
- Size: 641.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bbcfe0d9842039d677d5b8de9bf9b599560a04f2ad42cf790af63db7ae4f447
|
|
| MD5 |
1bf746e201a7c6c13c6834d13785ca5d
|
|
| BLAKE2b-256 |
f1940a46193ff8a6dab386f2be1dc8ac4e0b11532c1d3c17bb294e98841565f6
|
File details
Details for the file openshard-0.1.2-py3-none-any.whl.
File metadata
- Download URL: openshard-0.1.2-py3-none-any.whl
- Upload date:
- Size: 301.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27f925ea51580d2f22bb63f8fec05129ace77bba3b6abc474bb3584168fa53a8
|
|
| MD5 |
7d5e28c1c14d6393eeebc5894d5f4390
|
|
| BLAKE2b-256 |
29ab6a8dfd65461e7e7c43f7d8ebc761ffd2eff958ace366d911c797859d2bb8
|