Skip to main content

The control layer for AI coding agents.

Project description

OpenShard

The control layer for AI coding agents.

AI coding agents can write code, but engineering teams still need to understand what ran, what changed, what context was used, whether checks passed, what it cost and a way of proving it. OpenShard wraps AI coding agent runs with routing, review boundaries, checks, evidence, cost tracking, evals, feedback, and durable Shard receipts.

Agents write code. OpenShard controls the run and proves what happened.

License Status Python CLI


Why OpenShard exists

AI coding agents are becoming good enough to work on real repos, infrastructure, and production-shaped systems.

That creates a new problem. Not “can the model write code?” but:

  • Which model or workflow handled the task?
  • What files did it inspect?
  • What did it change?
  • Did checks pass, fail, skip, or not run?
  • What did the run cost?
  • Was anything risky gated or reviewed?
  • Is there a durable receipt of what happened?

OpenShard is built for the work around the agent: routing, verification, policy, evidence, cost awareness, and auditability. The valuable unit is not a single model call. It is a completed engineering task with evidence, checks, cost, and a receipt.


What OpenShard does

OpenShard is a CLI tool for controlling and recording AI coding agent runs.

It can:

  • Run real repo tasks through a controlled execution path
  • Route work across models and workflows where available
  • Classify task risk
  • Gate risky writes and commands
  • Record model used, risk, checks, changed files, evidence, cost, and result
  • Produce durable Shard receipts for every run
  • Support read-only review flows that preserve Changed 0 files
  • Provide workflow packs for repeatable engineering reviews
  • Compare models and workflows through local evals
  • Track feedback and session signals around runs

OpenShard is not trying to replace Claude Code, Codex, Cursor, OpenCode, or other coding agents.

Those tools do the coding work.

OpenShard sits around them as the control and audit layer.


Current developer loop

The current local developer loop is:

Ask -> Plan -> Run -> Inspect -> Feedback

Ask
Ask OpenShard product, model, command, and policy questions.

Plan
Generate a local execution plan. Plan Mode v1 is deterministic and local: it does not scan the repo, call a provider, or write files.

Run
Send a real repo task through OpenShard’s controlled execution path.

Inspect
Review the result, actions taken, evidence, checks, changed files, cost estimate, model choice, and Shard receipt.

Feedback
Record whether the result was accepted, partial, rejected, or needs more work.


Quick install

Recommended: pipx

pipx install git+https://github.com/MichaelObasa/openshard.git

Run:

openshard tui

Alternative: uv

uv tool install git+https://github.com/MichaelObasa/openshard.git

Local development:

git clone https://github.com/MichaelObasa/openshard.git
cd openshard
pip install -e .

See docs/install.md for upgrade instructions and notes.


Quick demo

OpenShard Demo

Launch the TUI:

openshard tui

Inside the TUI:

/ask what models do you support?
/plan review this repo for production readiness
/packs
/pack production-iac-hardening

Run a real repo task:

Review and harden this deliberately flawed Terraform codebase. Assess it through security/compliance posture, 2am operability, and developer experience for a 5-10 person engineering team. Identify critical, high, and medium risks. Explain trade-offs. Do not apply changes directly without review.

Inspect the latest run:

/last more

Or from the shell:

openshard last --more

OpenShard Last --more

Leave feedback:

openshard feedback --outcome accepted --note "Useful review"

Production IaC demo

The examples/production-infra-demo/ directory contains a fictional GCP workload called DocuVault — a sanitised demo scenario for OpenShard.

The infrastructure is intentionally production-shaped: networking, IAM, Cloud SQL, Cloud Run, storage, secrets, monitoring, and logging.

It is deliberately flawed to serve as the input for an infrastructure-as-code hardening review.

All names, project IDs, resource IDs, CIDRs, and accounts are fake and public-safe. No employer or customer details. Designed to show a serious IaC review, not a toy example.

See:

A typical production IaC review can show:

  • Critical, high, and medium findings
  • File-level evidence such as iam.tf, secrets.tf, database.tf, network.tf, and storage.tf
  • Verification output from tools like terraform fmt, terraform validate, and tflint when available
  • A clear Changed 0 files receipt for read-only reviews
  • Model selection and cost tracking
  • A /last more view with the full Shard, findings, checks, evidence, and cost comparison

This is the core OpenShard use case: let AI help with serious engineering work, but keep the control, evidence, and receipt layer visible.


Shard receipts

A Shard is the durable receipt for an AI engineering run.

It can show:

  • Task and agent
  • Model used
  • Strategy
  • Risk level
  • Context provenance
  • Inspected files
  • Changed and touched files
  • Checks and their outcomes
  • Findings, when structured findings exist
  • Cost
  • Actions timeline
  • Result

OpenShard can also record feedback and infer session signals around a run.

openshard last --more    # expanded receipt for the latest run
openshard last --full    # full stored details

Raw developer content is not stored by default.


One run, end to end

A normal OpenShard run can capture:

  1. Task - the user request or workflow pack prompt.
  2. Routing - which model or workflow was selected.
  3. Risk - whether the task is low, medium, high, or requires stronger review.
  4. Execution - what the agent did during the run.
  5. Checks - verification results, including passed, failed, skipped, or not run.
  6. Evidence - files inspected, findings, and relevant source references.
  7. Changes - files changed, touched, or left untouched.
  8. Cost - estimated spend for the run.
  9. Receipt - a durable Shard record that can be inspected later.

The point is simple: every AI coding run should leave behind enough evidence for a developer or team to understand what happened.


How OpenShard is different

OpenShard is not a chatbot, IDE, or even a generic agent framework. It's the layer around agentic coding work.

Layer What it does
Coding agent Generates code, edits files, answers task prompts
Model router Chooses which model or workflow should handle the job
Verification layer Runs checks and records whether they passed, failed, skipped, or were not run
Policy layer Gates risky writes, commands, and high-risk work
Receipt layer Records model, cost, evidence, checks, changed files, and result
Eval layer Compares models and workflows by outcome, cost, speed, and safety

OpenShard can work alongside tools like Claude Code, Codex, Cursor, OpenCode, LangChain, LangGraph, OpenRouter, and provider APIs.

The goal is not to replace every coding agent. The goal is to make AI coding work controllable, inspectable, and measurable.


Workflow packs

Workflow packs are pre-built prompts for repeatable engineering reviews.

openshard packs list
openshard packs show production-iac-hardening
openshard packs prompt production-iac-hardening

Built-in packs include:

  • repo-explanation
  • production-iac-hardening
  • terraform-networking-review
  • iam-security-review
  • cicd-safety-review
  • powershell-automation-review

Workflow packs make common review patterns repeatable without forcing users to rewrite long prompts every time.


Command reference

Most developers should start with the TUI:

openshard tui                                      # Launch the OpenShard terminal UI

Run tasks:

openshard run "Review this repo for risks"         # Run a task through OpenShard from the shell
openshard run --workflow native "Fix this bug"     # Run using the native workflow path

Inspect the latest run:

openshard last                                     # Show the latest run summary
openshard last --more                              # Show the expanded Shard receipt
openshard last --full                              # Show full stored/debug details

Record feedback:

openshard feedback --outcome accepted              # Mark the latest run as accepted
openshard feedback --outcome partial               # Mark the latest run as partly useful
openshard feedback --outcome rejected              # Mark the latest run as not useful
openshard feedback --outcome needs_work            # Mark the latest run as needing more work

Infer local session signals:

openshard session infer                            # Infer local behavioural/session signals from run history

Workflow packs:

openshard packs list                               # List available workflow packs
openshard packs show production-iac-hardening      # Show details for a workflow pack
openshard packs prompt production-iac-hardening    # Print the pack prompt

Model registry and policy:

openshard models list                              # List registered models
openshard models role reasoning                    # Show reasoning-capable models
openshard models role cheap_control                # Show low-cost/control models
openshard models mode ask                          # Show Ask Mode model policy
openshard models mode plan                         # Show Plan Mode model policy

Local evals:

openshard eval list                                # List eval suites
openshard eval validate --suite basic              # Validate an eval suite
openshard eval run --suite basic                   # Run an eval suite
openshard eval report                              # Show latest eval report
openshard eval compare                             # Compare models by eval results
openshard eval stats                               # Show eval stats

Useful TUI commands:

/ask what models do you support?                   # Ask OpenShard product/model questions
/plan review this repo for production readiness    # Generate a local plan without writing files
/packs                                             # List workflow packs inside the TUI
/pack production-iac-hardening                     # Load a workflow pack inside the TUI
/last                                              # Show the latest run
/last more                                         # Show expanded run details
/last full                                         # Show full debug/audit details
/feedback accepted                                 # Record feedback for the latest run
/clear                                             # Clear the output panel
/quit                                              # Exit the TUI

What works today

OpenShard is still alpha, but the core local loop is working.

Current features include:

  • Local CLI and TUI (openshard tui)
  • Ask Mode for local product/model/command Q&A
  • Plan Mode v1 for deterministic local plans
  • Controlled run path for real repo tasks
  • OpenShard Native execution harness
  • Task classification and risk handling
  • Model registry and model policy inspection
  • Routing across models/workflows where available
  • Shard receipts with model, risk, files, checks, cost, evidence, and result
  • /last, /last more, and /last --full
  • Read-only review handling that preserves Changed 0 files
  • Intent-specific review handling for Terraform/IaC, CI/CD, auth/security, tests, and docs/onboarding
  • Workflow packs for repeatable engineering reviews
  • Feedback signals
  • Session signal inference
  • Local run history
  • Local eval harness
  • Eval comparison by pass rate and cost-per-pass
  • Cost comparison in /last more
  • Production-shaped Terraform demo
  • 5,500+ passing tests and green CI

What is not built yet

OpenShard is early and intentionally local-first.

Not built yet:

  • No hosted team platform yet
  • No cloud sync yet
  • No hosted dashboard for teams yet
  • No IDE integration yet
  • No PyPI or Homebrew release yet — install from GitHub
  • Plan Mode is not repo-aware yet
  • Ask Mode and Plan Mode are local deterministic v1 flows
  • Feedback advisory does not automatically change routing yet
  • External harness adapters are experimental and not guaranteed
  • Not a full Claude Code, Codex, or Cursor replacement

Developer setup

git clone https://github.com/MichaelObasa/openshard.git
cd openshard
pip install -e .
python -m pytest -q
python -m ruff check .

Advanced: evals

OpenShard includes a local eval harness for checking routing and workflow behaviour.

openshard eval list
openshard eval validate --suite basic
openshard eval run --suite basic
openshard eval report
openshard eval compare
openshard eval stats

The goal is not just to ask “which model is best?”

The better question is:

Which model or workflow succeeds most reliably for this type of task, at what cost, with what safety profile?

The eval system can track:

  • Pass rate
  • Verification outcomes
  • Duration
  • Token usage where available
  • Cost where available
  • Cost per passing run
  • Unsafe file attempts
  • Model ranking across eval runs

This is the foundation for smarter routing over time: routing based on actual task outcomes.


Current validation state

OpenShard is still early, but it is not just a prototype.

Current validation includes:

  • 5,500+ passing tests
  • Green CI
  • Ruff-clean Python codebase
  • Local CLI/TUI workflow
  • Production-shaped Terraform demo
  • Workflow packs for repeatable reviews
  • Shard receipts for run history
  • Eval tooling for model and workflow comparison
  • Pre-launch usage from developers testing it on real work

The project is alpha, but the core loop is working:

Run the task -> inspect what happened -> verify the output -> create a receipt

Roadmap

Near-term roadmap:

  • Public open-source launch
  • More real-world developer testing
  • Better repo-aware planning
  • Stronger model/workflow ranking from real outcomes
  • More workflow packs
  • More repo analyzers for common stacks
  • Cleaner setup and release packaging
  • Hosted/team run history
  • Team policies and shared approval gates
  • Dashboards for cost, model usage, and verification outcomes

Longer-term, OpenShard should become the control plane teams use to manage AI engineering work.


Why open source?

Routing decisions should be inspectable.

If a tool decides which model touches security-sensitive code, developers should be able to see why.

OpenShard is open because trust, integrations, and routing policies improve when real users can inspect and extend the system.

Open source also keeps the local-first layer useful on its own. Hosted and team features can come later, but the core control layer should be understandable and inspectable.


Contributing

Contributions are welcome around:

  • Routing policies and scoring logic
  • Repo analyzers for new stacks
  • Model profiles and capability data
  • Evaluation datasets
  • Provider integrations
  • Workflow packs
  • CLI/TUI UX improvements
  • Documentation and examples

See CONTRIBUTING.md for details.


Security

If you find a security issue, please report it privately before opening a public issue.

See SECURITY.md.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openshard-0.1.0.tar.gz (553.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openshard-0.1.0-py3-none-any.whl (259.1 kB view details)

Uploaded Python 3

File details

Details for the file openshard-0.1.0.tar.gz.

File metadata

  • Download URL: openshard-0.1.0.tar.gz
  • Upload date:
  • Size: 553.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for openshard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cbcaa2b084581f96be84a07f3f2b4b33b8044d8d1841ffdc34899c97dd9c3884
MD5 8b8e5d652d79f669d2fdc866f1660832
BLAKE2b-256 c0841d7b67de836401707d77181acf23ddb272ed13a5f58adf2f878c505389ac

See more details on using hashes here.

File details

Details for the file openshard-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openshard-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 259.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for openshard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 078f3883addeb86e9fabe9c87f3ac150e1ccedc60aa5578f3f63a579c1f83f76
MD5 5ab6394630c481f9b3b78b5f3bc1344f
BLAKE2b-256 462fd5208d6cd5f0733490b129f871c1717774c252f24dbb87c2f75e6cb255d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page