Skip to main content

CLI for the ought behavioral test framework

Project description

Ought; Behavioral specs that test themselves

Ought separates test intent from test implementation. You write what your system ought to do in plain markdown. An LLM generates the tests. You run them.

The Problem

Today, test intent and test implementation are fused together in code. The assertion assert_eq!(response.status(), 401) buries the intent -- "invalid credentials must return 401" -- inside mechanical setup and plumbing. When the test fails, you see a stack trace, not the requirement that was violated. When requirements change, you rewrite test code instead of updating a sentence.

Ought pulls intent up into a human-readable spec and delegates the mechanical work to an LLM.

What Ought Does

Ought maintains a three-way sync between intent (.ought.md specs), source code, and tests — with the LLM as the mediator. Traditional test tools only see code, tests, and results; ought adds the intent layer and connects everything through it.

Quick Start

Install

Shell installer (Linux + macOS, recommended):

curl -sS https://sosein.ai/install.sh | sh

This downloads a prebuilt binary for your platform and installs it to ~/.local/bin/ought. Pin a version with OUGHT_VERSION=v0.1.0 or change the install location with OUGHT_INSTALL_DIR=/usr/local/bin. Inspect the script before running with curl -sS https://sosein.ai/install.sh | less.

Homebrew:

brew install soseinai/tap/ought

Cargo:

cargo install ought

From source:

git clone https://github.com/soseinai/ought
cd ought
just install

Or grab a prebuilt binary directly from GitHub Releases.

Initialize

ought init

This creates an ought/ directory with a sample spec and an ought.toml config file.

Write a spec

Edit ought/myapp.ought.md:

# User Authentication

context: REST API at /api/auth, uses JWT tokens
source: src/auth/

## Login

- **MUST** return a valid JWT token when given correct credentials
- **MUST** return 401 with a generic error when credentials are invalid
- **MUST NOT** leak timing differences between valid and invalid usernames
- **MUST BY 200ms** return a response under normal load
  - **OTHERWISE** return a cached session token
  - **OTHERWISE** return 503 with a Retry-After header
- **SHOULD** rate-limit to 5 attempts per minute per IP
- **WONT** support basic auth (deprecated in v2)

## Token Refresh

- **GIVEN** the refresh token is valid and not expired:
  - **MUST** issue a new access token
  - **SHOULD** rotate the refresh token (one-time use)
- **GIVEN** the refresh token is expired:
  - **MUST** return 401
  - **MUST** include a `WWW-Authenticate` header

## Invariants

- **MUST ALWAYS** return valid JSON from all endpoints
- **MUST ALWAYS** include a `X-Request-Id` header in every response

Generate tests

ought generate

The LLM reads your spec and source code, then writes concrete test files into ought/ought-gen/.

Run

ought run

Output:

 Authentication API          myapp.ought.md
 ------------------------------------------------
 Login
   ✓ MUST    return valid JWT on correct credentials
   ✗ MUST    return 401 on invalid credentials
   ✓ MUST    NOT leak timing differences
   ✓ MUST BY 200ms return a response              [47ms / 200ms]
     ↳ ~ OTHERWISE return cached session           (not reached)
     ↳ ~ OTHERWISE return 503                      (not reached)
   ✓ SHOULD  rate-limit to 5 attempts/min/ip
   ⊘ WONT   support basic auth                    (confirmed absent)

 Token Refresh
   GIVEN the refresh token is valid and not expired:
     ✓ MUST    issue a new access token
     ✓ SHOULD  rotate the refresh token
   GIVEN the refresh token is expired:
     ✓ MUST    return 401
     ✓ MUST    include WWW-Authenticate header

 Invariants
   ✓ MUST ALWAYS return valid JSON                 (tested 1000 inputs)
   ✓ MUST ALWAYS include X-Request-Id header       (tested 1000 inputs)

 11 passed · 1 failed · 1 confirmed absent
 MUST coverage: 8/9 (89%)

Spec Format

Spec files are standard CommonMark markdown with the .ought.md extension. They render in GitHub, display in any editor, and require no special tooling to read. The formal grammar is defined in docs/grammar.md — that file is the source of truth for what the parser accepts.

Structure:

  • H1 (#) -- spec name, one per file
  • H2+ (##, ###) -- sections, map to test groups
  • Bullet points (- **KEYWORD** ...) -- clauses, the testable units
  • Bold keywords (**MUST**) -- deontic operators (bare "must" in prose is ignored)
  • Prose between clauses -- context for humans and the LLM, not parsed as clauses
  • Code blocks after a clause -- hints for the LLM (example payloads, schemas, etc.)

Metadata appears below the H1:

# My Service

context: REST API using JWT tokens
source: src/auth/, src/models/user.rs
schema: db/migrations/
requires: [users](./users.ought.md)
Key Purpose
context: Free-text context for the LLM
source: Source code paths (hints for LLM context assembly)
schema: Schema, config, or migration files
requires: Dependencies on other spec files (builds a DAG)

Specs are hierarchical. A top-level spec captures broad product-level requirements, linking down to detail specs that flesh out specifics via Details: annotations and requires: links.

Keywords Reference

Standard obligations

Keyword Severity On failure Exit code
MUST required error 1
MUST NOT required error 1
SHOULD recommended warning 0 (1 with --fail-on-should)
SHOULD NOT recommended warning 0
MAY optional info 0
WONT negative error if present 1

Deontic extensions

Keyword What it does
GIVEN Conditional block. Nested clauses only apply when the precondition holds. Not itself testable.
OTHERWISE Contrary-to-duty fallback. Nested under an obligation, forms an ordered degradation chain. If the parent fails but an OTHERWISE passes, the overall result is a pass.
MUST ALWAYS Invariant. Must hold across all states, inputs, and time. Generates property-based / fuzz tests.
MUST BY Deadline. Must complete within a time bound (e.g., **MUST BY 200ms**). Duration suffixes: ms, s, m.

CLI Reference

Command Description
ought init Scaffold ought.toml and an example spec
ought generate Regenerate tests for stale clauses
ought generate --force Regenerate all tests
ought generate --check Exit 1 if any clause is stale (CI gate)
ought run Execute tests, report results mapped to clauses
ought run --fail-on-should Exit 1 on SHOULD failures too (default: MUST only)
ought check Validate spec syntax only (no LLM, no execution)
ought extract [paths...] Audit existing specs and reverse-engineer drafts for uncovered source
ought inspect <clause> Show generated test code for a clause
ought diff Show pending generation changes
ought analyze survey [path] Discover source behaviors not covered by any spec
ought debug blame <clause> Explain a failure with git history context
ought debug bisect <clause> Find the exact commit that broke a clause
ought watch Re-run on file changes
ought view Launch the visual spec viewer in the browser
ought mcp serve Start the MCP server
ought mcp install Register with Claude Code, Codex, OpenCode

Exit codes: 0 = success (or only SHOULD/MAY failures), 1 = MUST-level failure, 2 = usage error.

Global flags: --config, --quiet, --json, --junit <path>, --color, --verbose.

How It Works

The engine has four phases. Parse converts .ought.md files into a structured clause IR using a pure-Rust parser with zero LLM dependency. Generate takes the clause IR plus source code context and uses an LLM to produce concrete, idiomatic test files. Execute delegates to the project's existing test harness (cargo test, pytest, jest, go test) and collects per-test results. Report maps results back to spec clauses and renders them in the terminal with severity-appropriate formatting.

LLM Providers

Ought invokes LLM CLIs directly by exec-ing claude, chatgpt, or ollama -- no API keys to manage in ought itself. Use your consumer account, pro plan, or API key as you normally would with the CLI tool.

Configure the provider in ought.toml:

[generator]
provider = "anthropic"       # or "openai", "ollama"
model = "claude-sonnet-4-6"

Custom providers are supported by specifying an arbitrary executable.

Analysis Commands

Beyond test generation and execution, ought uses LLMs to reason about relationships between specs, source code, and results.

ought analyze survey [path] -- Scans source code and identifies behaviors not covered by any spec. Suggests concrete clauses with appropriate keywords. Never auto-adds clauses without user confirmation.

ought extract [paths...] -- Cold-start sibling of survey that writes files. Runs a rule-based audit over your existing specs (contradictions, gaps, missing OTHERWISE chains, deadline conflicts), then dispatches LLM agents to draft .ought.md files for uncovered source areas.

ought debug blame <clause> -- Correlates a failing clause with git history to build a causal narrative: what commit broke it, who authored it, and what the change was trying to do.

ought debug bisect <clause> -- Automated binary search through git history to find the exact breaking commit. Like git bisect but targeted at a specific clause. Always restores the working tree afterward.

MCP Server

Ought exposes an MCP (Model Context Protocol) server for AI assistants and IDE extensions. Running ought mcp serve starts a stdio-based server that exposes tools (ought_run, ought_generate, ought_survey, ought_audit, ought_blame, ought_bisect) and resources (ought://specs, ought://results/latest, ought://coverage, ought://manifest). This lets tools like Claude Code, Codex, and OpenCode interact with your specs and results programmatically. Install with ought mcp install.

Configuration

ought.toml in the project root:

[project]
name = "myapp"
version = "0.1.0"

[specs]
roots = ["ought/"]

[context]
search_paths = ["src/", "lib/"]
exclude = ["vendor/", "generated/"]
max_files = 50

[generator]
provider = "anthropic"
model = "claude-sonnet-4-6"

[generator.tolerance]
must_by_multiplier = 1.0     # CI timing tolerance for MUST BY (default 1.0; bump if your CI is slow)

[runner.rust]
command = "cargo test"
test_dir = "ought/ought-gen/"

[runner.python]
command = "pytest"
test_dir = "ought/ought-gen/"

[mcp]
enabled = true
transport = "stdio"

Philosophy

The spec language is grounded in deontic logic -- the formal logic of obligation, permission, and prohibition. The keywords are not arbitrary labels. MUST and MUST NOT are obligations. SHOULD is a prima facie duty (Ross). MAY is permission. GIVEN models conditional obligation from dyadic deontic logic. OTHERWISE models contrary-to-duty obligations (Chisholm's paradox). MUST ALWAYS and MUST BY draw from temporal deontic logic.

The name comes from Hume's is-ought gap (1739): you cannot derive an "ought" from an "is." The spec says what the system ought to do. The source code says what it does. Testing is detecting when they diverge. Ought lives in that gap.

See docs/design.md for the full design document.

Contributing

Ought is written in Rust and structured as a Cargo workspace, with a small Svelte UI for the proof viewer:

crates/
  ought-spec/        # parser + clause IR (the open standard)
  ought-gen/         # generator trait + providers
  ought-run/         # runner trait + language runners
  ought-report/      # reporter + TUI
  ought-analysis/    # survey, audit, blame, bisect
  ought-mcp/         # MCP server
  ought-server/      # viewer web UI (Svelte + shadcn-svelte)
  ought-cli/         # CLI binary

ought-spec has zero dependencies on LLM infrastructure and is published separately for ecosystem interop.

Building from source

Prerequisites: Rust (stable), Node.js (20+), and just.

just build       # build everything (UI + Rust)
just test        # run all tests
just lint        # lint UI (svelte-check) + Rust (clippy)
just ci          # full CI pipeline (test + lint)
just install     # build a release binary and install ought to ~/.local/bin
just --list      # list all recipes

The Svelte UI is bundled into the ought binary at compile time via rust-embed, so the UI must be built before any cargo command — just handles that ordering for you. See CONTRIBUTING.md for the contributor agreement.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ought-0.2.1-py3-none-win_amd64.whl (3.5 MB view details)

Uploaded Python 3Windows x86-64

ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl (3.5 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64

ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

ought-0.2.1-py3-none-macosx_11_0_arm64.whl (3.4 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

ought-0.2.1-py3-none-macosx_10_12_x86_64.whl (3.5 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file ought-0.2.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: ought-0.2.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for ought-0.2.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 616dc162444794a63270463cf3802cbda1f10dfad93c5b3b4e6893f2472738d8
MD5 c710e445b507b92651cd1fd062ea882a
BLAKE2b-256 b0ba3f3e67c894a97eecc404893af4ca83b0a9b869f1df20e42b0f9a6f8bdb6e

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f2392f0c762121138bedd6c6af12b08ae7a4fda3d011b033d9f72d1d7d96dfce
MD5 80cec8620a3785d3f4ce69fa412caaca
BLAKE2b-256 37b25d598764db2bde24b52a64ffa545ed19ccc7a24d31720196639c30bdbb93

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 da8092cbfe45ddabe32b632f6337b0cb08f0770174e99da2ca34b8c683d92e2b
MD5 91d8008c2b606e5a8220361fbb060f7c
BLAKE2b-256 93745102aff9089b95062492488185138b079cbcbb1a3f56e14244dad4024cb2

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ought-0.2.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d3ae430fd485f50481a1fc1991d22427a3b286ff7da63bbc524397dda5e61864
MD5 e089b77dee47e12987a2332fab243b59
BLAKE2b-256 472f1287d52229fe97179587c2804562a0fcce30674268e4b0c914b947e377e8

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for ought-0.2.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d3e08c97d3d83670be2ddaa8fbb3b79f1df9ee4dafc17ff5429261a6bb17b99c
MD5 a1ad829a3c8fac38dd8c949303b3446b
BLAKE2b-256 3b5204cf3b2cb2ab3a6faba2bf6a91a6143e03415b592c87211dbfa7df2efb8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page