Skip to main content

The control layer for AI coding agents.

Project description

OpenShard

The control layer for AI coding agents.

AI coding agents can write code, but engineering teams still need to understand what ran, what changed, what context was used, whether checks passed, what it cost and a way of proving it. OpenShard wraps AI coding agent runs with routing, review boundaries, checks, evidence, cost tracking, evals, feedback, and durable Shard receipts.

Agents write code. OpenShard controls the run and proves what happened.

License Status Python CLI


Why OpenShard exists

AI coding agents are becoming good enough to work on real repos, infrastructure, and production-shaped systems.

That creates a new problem. Not “can the model write code?” but:

  • Which model or workflow handled the task?
  • What files did it inspect?
  • What did it change?
  • Did checks pass, fail, skip, or not run?
  • What did the run cost?
  • Was anything risky gated or reviewed?
  • Is there a durable receipt of what happened?

OpenShard is built for the work around the agent: routing, verification, policy, evidence, cost awareness, and auditability. The valuable unit is not a single model call. It is a completed engineering task with evidence, checks, cost, and a receipt.


What OpenShard does

OpenShard is a CLI tool for controlling and recording AI coding agent runs.

It can:

  • Run real repo tasks through a controlled execution path
  • Route work across models and workflows where available
  • Classify task risk
  • Gate risky writes and commands
  • Record model used, risk, checks, changed files, evidence, cost, and result
  • Produce durable Shard receipts for every run
  • Support read-only review flows that preserve Changed 0 files
  • Provide workflow packs for repeatable engineering reviews
  • Compare models and workflows through local evals
  • Track feedback and session signals around runs

OpenShard is not trying to replace Claude Code, Codex, Cursor, OpenCode, or other coding agents.

Those tools do the coding work.

OpenShard sits around them as the control and audit layer.


Current developer loop

The current local developer loop is:

Ask -> Plan -> Run -> Inspect -> Feedback

Ask
Ask OpenShard product, model, command, and policy questions.

Plan
Generate a local execution plan. Plan Mode v1 is deterministic and local: it does not scan the repo, call a provider, or write files.

Run
Send a real repo task through OpenShard’s controlled execution path.

Inspect
Review the result, actions taken, evidence, checks, changed files, cost estimate, model choice, and Shard receipt.

Feedback
Record whether the result was accepted, partial, rejected, or needs more work.


Quick install

Recommended: pipx

pipx install openshard
openshard tui

Alternative: uv

uv tool install openshard
openshard tui

Upgrade later:

pipx upgrade openshard

See docs/install.md for upgrade instructions and notes.


Quick demo

OpenShard Demo

Launch the TUI:

openshard tui

Inside the TUI:

/ask what models do you support?
/plan review this repo for production readiness
/packs
/pack production-iac-hardening

Run a real repo task:

Review and harden this deliberately flawed Terraform codebase. Assess it through security/compliance posture, 2am operability, and developer experience for a 5-10 person engineering team. Identify critical, high, and medium risks. Explain trade-offs. Do not apply changes directly without review.

Inspect the latest run:

/last more

Or from the shell:

openshard last --more

OpenShard Last --more

The --more view includes a PROOF SUMMARY block when OSN proof metadata is present, showing observation, progress, verification, loop, retry, and PR comment status.

Optional local follow-up commands after a run:

openshard reflect last                        # advisory reflection on the run (local, no model calls)
openshard pr comment                          # generate a GitHub-ready PR comment from the run
openshard pr comment --output pr-comment.md  # write the PR comment to a file

Leave feedback:

openshard feedback --outcome accepted --reason "Useful review"

See the demo scripts for a recorded walkthrough:


Production IaC demo

The examples/production-infra-demo/ directory contains a fictional GCP workload called DocuVault — a sanitised demo scenario for OpenShard.

The infrastructure is intentionally production-shaped: networking, IAM, Cloud SQL, Cloud Run, storage, secrets, monitoring, and logging.

It is deliberately flawed to serve as the input for an infrastructure-as-code hardening review.

All names, project IDs, resource IDs, CIDRs, and accounts are fake and public-safe. No employer or customer details. Designed to show a serious IaC review, not a toy example.

See:

A typical production IaC review can show:

  • Critical, high, and medium findings
  • File-level evidence such as iam.tf, secrets.tf, database.tf, network.tf, and storage.tf
  • Verification output from tools like terraform fmt, terraform validate, and tflint when available
  • A clear Changed 0 files receipt for read-only reviews
  • Model selection and cost tracking
  • A /last more view with the full Shard, findings, checks, evidence, and cost comparison

This is the core OpenShard use case: let AI help with serious engineering work, but keep the control, evidence, and receipt layer visible.


Shard receipts

A Shard is the durable receipt for an AI engineering run.

It can show:

  • Task and agent
  • Model used
  • Strategy
  • Risk level
  • Context provenance
  • Inspected files
  • Changed and touched files
  • Checks and their outcomes
  • Findings, when structured findings exist
  • Cost
  • Actions timeline
  • Result

OpenShard can also record feedback and infer session signals around a run.

openshard last --more    # expanded receipt for the latest run
openshard last --full    # full stored details

Every Shard receipt can power two local follow-up commands:

openshard reflect last                        # local advisory reflection on the run
openshard pr comment                          # generate a GitHub-ready PR comment
openshard pr comment --output pr-comment.md  # write the PR comment to a file instead

Both commands are local and deterministic. They do not make additional model calls.

Raw developer content is not stored by default.


One run, end to end

A normal OpenShard run can capture:

  1. Task - the user request or workflow pack prompt.
  2. Routing - which model or workflow was selected.
  3. Risk - whether the task is low, medium, high, or requires stronger review.
  4. Execution - what the agent did during the run.
  5. Checks - verification results, including passed, failed, skipped, or not run.
  6. Evidence - files inspected, findings, and relevant source references.
  7. Changes - files changed, touched, or left untouched.
  8. Cost - estimated spend for the run.
  9. Receipt - a durable Shard record that can be inspected later.

The point is simple: every AI coding run should leave behind enough evidence for a developer or team to understand what happened.


How OpenShard is different

OpenShard is not a chatbot, IDE, or even a generic agent framework. It's the layer around agentic coding work.

Layer What it does
Coding agent Generates code, edits files, answers task prompts
Model router Chooses which model or workflow should handle the job
Verification layer Runs checks and records whether they passed, failed, skipped, or were not run
Policy layer Gates risky writes, commands, and high-risk work
Receipt layer Records model, cost, evidence, checks, changed files, and result
Eval layer Compares models and workflows by outcome, cost, speed, and safety

OpenShard can work alongside tools like Claude Code, Codex, Cursor, OpenCode, LangChain, LangGraph, OpenRouter, and provider APIs.

The goal is not to replace every coding agent. The goal is to make AI coding work controllable, inspectable, and measurable.


Workflow packs

Workflow packs are pre-built prompts for repeatable engineering reviews.

openshard packs list
openshard packs show production-iac-hardening
openshard packs prompt production-iac-hardening

Built-in packs include:

  • repo-explanation
  • production-iac-hardening
  • terraform-networking-review
  • iam-security-review
  • cicd-safety-review
  • powershell-automation-review

Workflow packs make common review patterns repeatable without forcing users to rewrite long prompts every time.


Command reference

Most developers should start with the TUI:

openshard tui                                      # Launch the OpenShard terminal UI

Run tasks:

openshard run "Review this repo for risks"         # Run a task through OpenShard from the shell
openshard run --workflow native "Fix this bug"     # Run using the native workflow path

Inspect the latest run:

openshard last                                     # Show the latest run summary
openshard last --more                              # Show the expanded Shard receipt
openshard last --full                              # Show full stored/debug details

Reflect and export:

openshard reflect last                             # Advisory reflection on the last run (local, no model calls)
openshard pr comment                               # Generate a GitHub-ready PR comment from the last run
openshard pr comment --output pr-comment.md        # Write the PR comment to a file

Record feedback:

openshard feedback --outcome accepted              # Mark the latest run as accepted
openshard feedback --outcome partial               # Mark the latest run as partly useful
openshard feedback --outcome rejected              # Mark the latest run as not useful
openshard feedback --outcome abandoned             # Mark the latest run as abandoned
openshard feedback --outcome accepted --reason "kept as-is"  # Optionally include a free-text reason

Infer local session signals:

openshard session infer                            # Infer local behavioural/session signals from run history

Workflow packs:

openshard packs list                               # List available workflow packs
openshard packs show production-iac-hardening      # Show details for a workflow pack
openshard packs prompt production-iac-hardening    # Print the pack prompt

Model registry and policy:

openshard models list                              # List registered models
openshard models role reasoning                    # Show reasoning-capable models
openshard models role cheap_control                # Show low-cost/control models
openshard models mode ask                          # Show Ask Mode model policy
openshard models mode plan                         # Show Plan Mode model policy

Local evals:

openshard eval list                                # List eval suites
openshard eval validate --suite basic              # Validate an eval suite
openshard eval run --suite basic                   # Run an eval suite
openshard eval report                              # Show latest eval report
openshard eval compare                             # Compare models by eval results
openshard eval stats                               # Show eval stats

Useful TUI commands:

/ask what models do you support?                   # Ask OpenShard product/model questions
/plan review this repo for production readiness    # Generate a local plan without writing files
/packs                                             # List workflow packs inside the TUI
/pack production-iac-hardening                     # Load a workflow pack inside the TUI
/last                                              # Show the latest run
/last more                                         # Show expanded run details
/last full                                         # Show full debug/audit details
/feedback accepted                                 # Record feedback for the latest run
/clear                                             # Clear the output panel
/quit                                              # Exit the TUI

After a run completes, the TUI shows command hints for openshard reflect last and openshard pr comment.


What works today

OpenShard is still alpha, but the core local loop is working.

Current features include:

  • Local CLI and TUI (openshard tui)
  • Ask Mode for local product/model/command Q&A
  • Plan Mode v1 for deterministic local plans
  • Controlled run path for real repo tasks
  • OpenShard Native execution harness
  • Task classification and risk handling
  • Model registry and model policy inspection
  • Routing across models/workflows where available
  • Shard receipts with model, risk, files, checks, cost, evidence, and result
  • /last, /last more, and /last --full
  • Read-only review handling that preserves Changed 0 files
  • Intent-specific review handling for Terraform/IaC, CI/CD, auth/security, tests, and docs/onboarding
  • Workflow packs for repeatable engineering reviews
  • Feedback signals
  • Session signal inference
  • Local run history
  • Local eval harness
  • Eval comparison by pass rate and cost-per-pass
  • Cost comparison in /last more
  • OSN proof pipeline with PROOF SUMMARY in openshard last --more (Observation, Progress, Verification, Loop, Retry)
  • openshard reflect last for local advisory run reflection (deterministic, no model calls)
  • openshard pr comment for local GitHub PR comment generation
  • TUI post-run command hints for reflect and pr comment
  • Production-shaped Terraform demo
  • 5,500+ passing tests and green CI

What is not built yet

OpenShard is early and intentionally local-first.

Not built yet:

  • No hosted team platform yet
  • No cloud sync yet
  • No hosted dashboard for teams yet
  • No IDE integration yet
  • No Homebrew, winget, or one-line shell installer yet
  • Plan Mode is not repo-aware yet
  • Ask Mode and Plan Mode are local deterministic v1 flows
  • Feedback advisory does not automatically change routing yet
  • External harness adapters are experimental and not guaranteed
  • Not a full Claude Code, Codex, or Cursor replacement

Local data and privacy

All run receipts, history, and proof metadata are stored locally in ~/.openshard/.

No run data, file contents, or task metadata is sent to OpenShard servers. There are none.

Model calls go directly to the provider you configure (Anthropic, OpenRouter, etc.) under your own API key.

openshard pr comment generates markdown locally and outputs to stdout or a local file. Nothing is posted to GitHub automatically.


Developer setup

Use this if you want to modify OpenShard locally or contribute code:

git clone https://github.com/MichaelObasa/openshard.git
cd openshard
pip install -e ".[dev]"
python -m pytest -q
python -m ruff check .

Advanced: evals

OpenShard includes a local eval harness for checking routing and workflow behaviour.

openshard eval list
openshard eval validate --suite basic
openshard eval run --suite basic
openshard eval report
openshard eval compare
openshard eval stats

The goal is not just to ask “which model is best?”

The better question is:

Which model or workflow succeeds most reliably for this type of task, at what cost, with what safety profile?

The eval system can track:

  • Pass rate
  • Verification outcomes
  • Duration
  • Token usage where available
  • Cost where available
  • Cost per passing run
  • Unsafe file attempts
  • Model ranking across eval runs

This is the foundation for smarter routing over time: routing based on actual task outcomes.


Current validation state

OpenShard is still early, but it is not just a prototype.

Current validation includes:

  • 5,500+ passing tests
  • Green CI
  • Ruff-clean Python codebase
  • Local CLI/TUI workflow
  • Production-shaped Terraform demo
  • Workflow packs for repeatable reviews
  • Shard receipts for run history
  • Eval tooling for model and workflow comparison
  • Pre-launch usage from developers testing it on real work

The project is alpha, but the core loop is working:

Run the task -> inspect what happened -> verify the output -> create a receipt

Roadmap

Near-term roadmap:

  • Public open-source launch
  • More real-world developer testing
  • Better repo-aware planning
  • Stronger model/workflow ranking from real outcomes
  • More workflow packs
  • More repo analyzers for common stacks
  • Cleaner setup and release packaging
  • Hosted/team run history
  • Team policies and shared approval gates
  • Dashboards for cost, model usage, and verification outcomes

Longer-term, OpenShard should become the control plane teams use to manage AI engineering work.


Why open source?

Routing decisions should be inspectable.

If a tool decides which model touches security-sensitive code, developers should be able to see why.

OpenShard is open because trust, integrations, and routing policies improve when real users can inspect and extend the system.

Open source also keeps the local-first layer useful on its own. Hosted and team features can come later, but the core control layer should be understandable and inspectable.


Contributing

Contributions are welcome around:

  • Routing policies and scoring logic
  • Repo analyzers for new stacks
  • Model profiles and capability data
  • Evaluation datasets
  • Provider integrations
  • Workflow packs
  • CLI/TUI UX improvements
  • Documentation and examples

See CONTRIBUTING.md for details.


Security

If you find a security issue, please report it privately before opening a public issue.

See SECURITY.md.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openshard-0.1.2.tar.gz (641.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openshard-0.1.2-py3-none-any.whl (301.7 kB view details)

Uploaded Python 3

File details

Details for the file openshard-0.1.2.tar.gz.

File metadata

  • Download URL: openshard-0.1.2.tar.gz
  • Upload date:
  • Size: 641.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for openshard-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4bbcfe0d9842039d677d5b8de9bf9b599560a04f2ad42cf790af63db7ae4f447
MD5 1bf746e201a7c6c13c6834d13785ca5d
BLAKE2b-256 f1940a46193ff8a6dab386f2be1dc8ac4e0b11532c1d3c17bb294e98841565f6

See more details on using hashes here.

File details

Details for the file openshard-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: openshard-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 301.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for openshard-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 27f925ea51580d2f22bb63f8fec05129ace77bba3b6abc474bb3584168fa53a8
MD5 7d5e28c1c14d6393eeebc5894d5f4390
BLAKE2b-256 29ab6a8dfd65461e7e7c43f7d8ebc761ffd2eff958ace366d911c797859d2bb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page