CLI for the ought behavioral test framework

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

aishfenton

These details have not been verified by PyPI

Project links

Homepage

Project description

Ought; Behavioral specs that test themselves

Ought separates test intent from test implementation. You write what your system ought to do in plain markdown. An LLM generates the tests. You run them.

The Problem

Today, test intent and test implementation are fused together in code. The assertion assert_eq!(response.status(), 401) buries the intent -- "invalid credentials must return 401" -- inside mechanical setup and plumbing. When the test fails, you see a stack trace, not the requirement that was violated. When requirements change, you rewrite test code instead of updating a sentence.

Ought pulls intent up into a human-readable spec and delegates the mechanical work to an LLM.

What Ought Does

Ought maintains a three-way sync between intent (.ought.md specs), source code, and tests — with the LLM as the mediator. Traditional test tools only see code, tests, and results; ought adds the intent layer and connects everything through it.

Quick Start

Install

Shell installer (Linux + macOS, recommended):

curl -sS https://sosein.ai/install.sh | sh

This downloads a prebuilt binary for your platform and installs it to ~/.local/bin/ought. Pin a version with OUGHT_VERSION=v0.1.0 or change the install location with OUGHT_INSTALL_DIR=/usr/local/bin. Inspect the script before running with curl -sS https://sosein.ai/install.sh | less.

Homebrew:

brew install soseinai/tap/ought

Cargo:

cargo install ought

From source:

git clone https://github.com/soseinai/ought
cd ought
just install

Or grab a prebuilt binary directly from GitHub Releases.

Initialize

ought init

This creates an ought/ directory with a sample spec and an ought.toml config file.

Write a spec

Edit ought/myapp.ought.md:

# User Authentication

context: REST API at /api/auth, uses JWT tokens
source: src/auth/

## Login

- **MUST** return a valid JWT token when given correct credentials
- **MUST** return 401 with a generic error when credentials are invalid
- **MUST NOT** leak timing differences between valid and invalid usernames
- **MUST BY 200ms** return a response under normal load
  - **OTHERWISE** return a cached session token
  - **OTHERWISE** return 503 with a Retry-After header
- **SHOULD** rate-limit to 5 attempts per minute per IP
- **WONT** support basic auth (deprecated in v2)

## Token Refresh

- **GIVEN** the refresh token is valid and not expired:
  - **MUST** issue a new access token
  - **SHOULD** rotate the refresh token (one-time use)
- **GIVEN** the refresh token is expired:
  - **MUST** return 401
  - **MUST** include a `WWW-Authenticate` header

## Invariants

- **MUST ALWAYS** return valid JSON from all endpoints
- **MUST ALWAYS** include a `X-Request-Id` header in every response

Generate tests

ought generate

The LLM reads your spec and source code, then writes concrete test files into ought/ought-gen/.

Run

ought run

Output:

 Authentication API          myapp.ought.md
 ------------------------------------------------
 Login
   ✓ MUST    return valid JWT on correct credentials
   ✗ MUST    return 401 on invalid credentials
   ✓ MUST    NOT leak timing differences
   ✓ MUST BY 200ms return a response              [47ms / 200ms]
     ↳ ~ OTHERWISE return cached session           (not reached)
     ↳ ~ OTHERWISE return 503                      (not reached)
   ✓ SHOULD  rate-limit to 5 attempts/min/ip
   ⊘ WONT   support basic auth                    (confirmed absent)

 Token Refresh
   GIVEN the refresh token is valid and not expired:
     ✓ MUST    issue a new access token
     ✓ SHOULD  rotate the refresh token
   GIVEN the refresh token is expired:
     ✓ MUST    return 401
     ✓ MUST    include WWW-Authenticate header

 Invariants
   ✓ MUST ALWAYS return valid JSON                 (tested 1000 inputs)
   ✓ MUST ALWAYS include X-Request-Id header       (tested 1000 inputs)

 11 passed · 1 failed · 1 confirmed absent
 MUST coverage: 8/9 (89%)

Spec Format

Spec files are standard CommonMark markdown with the .ought.md extension. They render in GitHub, display in any editor, and require no special tooling to read. The formal grammar is defined in docs/grammar.md — that file is the source of truth for what the parser accepts.

Structure:

H1 (#) -- spec name, one per file
H2+ (##, ###) -- sections, map to test groups
Bullet points (- **KEYWORD** ...) -- clauses, the testable units
Bold keywords (**MUST**) -- deontic operators (bare "must" in prose is ignored)
Prose between clauses -- context for humans and the LLM, not parsed as clauses
Code blocks after a clause -- hints for the LLM (example payloads, schemas, etc.)

Metadata appears below the H1:

# My Service

context: REST API using JWT tokens
source: src/auth/, src/models/user.rs
schema: db/migrations/
requires: [users](./users.ought.md)

Key	Purpose
`context:`	Free-text context for the LLM
`source:`	Source code paths (hints for LLM context assembly)
`schema:`	Schema, config, or migration files
`requires:`	Dependencies on other spec files (builds a DAG)

Specs are hierarchical. A top-level spec captures broad product-level requirements, linking down to detail specs that flesh out specifics via Details: annotations and requires: links.

Keywords Reference

Standard obligations

Keyword	Severity	On failure	Exit code
MUST	required	error	1
MUST NOT	required	error	1
SHOULD	recommended	warning	0 (1 with `--fail-on-should`)
SHOULD NOT	recommended	warning	0
MAY	optional	info	0
WONT	negative	error if present	1

Deontic extensions

Keyword	What it does
GIVEN	Conditional block. Nested clauses only apply when the precondition holds. Not itself testable.
OTHERWISE	Contrary-to-duty fallback. Nested under an obligation, forms an ordered degradation chain. If the parent fails but an OTHERWISE passes, the overall result is a pass.
MUST ALWAYS	Invariant. Must hold across all states, inputs, and time. Generates property-based / fuzz tests.
MUST BY	Deadline. Must complete within a time bound (e.g., `MUST BY 200ms`). Duration suffixes: `ms`, `s`, `m`.

CLI Reference

Command	Description
`ought init`	Scaffold `ought.toml` and an example spec
`ought generate`	Regenerate tests for stale clauses
`ought generate --force`	Regenerate all tests
`ought generate --check`	Exit 1 if any clause is stale (CI gate)
`ought run`	Execute tests, report results mapped to clauses
`ought run --fail-on-should`	Exit 1 on SHOULD failures too (default: MUST only)
`ought check`	Validate spec syntax only (no LLM, no execution)
`ought extract [paths...]`	Audit existing specs and reverse-engineer drafts for uncovered source
`ought inspect <clause>`	Show generated test code for a clause
`ought diff`	Show pending generation changes
`ought analyze survey [path]`	Discover source behaviors not covered by any spec
`ought debug blame <clause>`	Explain a failure with git history context
`ought debug bisect <clause>`	Find the exact commit that broke a clause
`ought watch`	Re-run on file changes
`ought view`	Launch the visual spec viewer in the browser
`ought mcp serve`	Start the MCP server
`ought mcp install`	Register with Claude Code, Codex, OpenCode

Exit codes: 0 = success (or only SHOULD/MAY failures), 1 = MUST-level failure, 2 = usage error.

Global flags: --config, --quiet, --json, --junit <path>, --color, --verbose.

How It Works

The engine has four phases. Parse converts .ought.md files into a structured clause IR using a pure-Rust parser with zero LLM dependency. Generate takes the clause IR plus source code context and uses an LLM to produce concrete, idiomatic test files. Execute delegates to the project's existing test harness (cargo test, pytest, jest, go test) and collects per-test results. Report maps results back to spec clauses and renders them in the terminal with severity-appropriate formatting.

LLM Providers

Ought invokes LLM CLIs directly by exec-ing claude, chatgpt, or ollama -- no API keys to manage in ought itself. Use your consumer account, pro plan, or API key as you normally would with the CLI tool.

Configure the provider in ought.toml:

[generator]
provider = "anthropic"       # or "openai", "ollama"
model = "claude-sonnet-4-6"

Custom providers are supported by specifying an arbitrary executable.

Analysis Commands

Beyond test generation and execution, ought uses LLMs to reason about relationships between specs, source code, and results.

ought analyze survey [path] -- Scans source code and identifies behaviors not covered by any spec. Suggests concrete clauses with appropriate keywords. Never auto-adds clauses without user confirmation.

ought extract [paths...] -- Cold-start sibling of survey that writes files. Runs a rule-based audit over your existing specs (contradictions, gaps, missing OTHERWISE chains, deadline conflicts), then dispatches LLM agents to draft .ought.md files for uncovered source areas.

ought debug blame <clause> -- Correlates a failing clause with git history to build a causal narrative: what commit broke it, who authored it, and what the change was trying to do.

ought debug bisect <clause> -- Automated binary search through git history to find the exact breaking commit. Like git bisect but targeted at a specific clause. Always restores the working tree afterward.

MCP Server

Ought exposes an MCP (Model Context Protocol) server for AI assistants and IDE extensions. Running ought mcp serve starts a stdio-based server that exposes tools (ought_run, ought_generate, ought_survey, ought_audit, ought_blame, ought_bisect) and resources (ought://specs, ought://results/latest, ought://coverage, ought://manifest). This lets tools like Claude Code, Codex, and OpenCode interact with your specs and results programmatically. Install with ought mcp install.

Configuration

ought.toml in the project root:

[project]
name = "myapp"
version = "0.1.0"

[specs]
roots = ["ought/"]

[context]
search_paths = ["src/", "lib/"]
exclude = ["vendor/", "generated/"]
max_files = 50

[generator]
provider = "anthropic"
model = "claude-sonnet-4-6"

[generator.tolerance]
must_by_multiplier = 1.0     # CI timing tolerance for MUST BY (default 1.0; bump if your CI is slow)

[runner.rust]
command = "cargo test"
test_dir = "ought/ought-gen/"

[runner.python]
command = "pytest"
test_dir = "ought/ought-gen/"

[mcp]
enabled = true
transport = "stdio"

Philosophy

The spec language is grounded in deontic logic -- the formal logic of obligation, permission, and prohibition. The keywords are not arbitrary labels. MUST and MUST NOT are obligations. SHOULD is a prima facie duty (Ross). MAY is permission. GIVEN models conditional obligation from dyadic deontic logic. OTHERWISE models contrary-to-duty obligations (Chisholm's paradox). MUST ALWAYS and MUST BY draw from temporal deontic logic.

The name comes from Hume's is-ought gap (1739): you cannot derive an "ought" from an "is." The spec says what the system ought to do. The source code says what it does. Testing is detecting when they diverge. Ought lives in that gap.

See docs/design.md for the full design document.

Contributing

Ought is written in Rust and structured as a Cargo workspace, with a small Svelte UI for the proof viewer:

crates/
  ought-spec/        # parser + clause IR (the open standard)
  ought-gen/         # generator trait + providers
  ought-run/         # runner trait + language runners
  ought-report/      # reporter + TUI
  ought-analysis/    # survey, audit, blame, bisect
  ought-mcp/         # MCP server
  ought-server/      # viewer web UI (Svelte + shadcn-svelte)
  ought-cli/         # CLI binary

ought-spec has zero dependencies on LLM infrastructure and is published separately for ecosystem interop.

Building from source

Prerequisites: Rust (stable), Node.js (20+), and just.

just build       # build everything (UI + Rust)
just test        # run all tests
just lint        # lint UI (svelte-check) + Rust (clippy)
just ci          # full CI pipeline (test + lint)
just install     # build a release binary and install ought to ~/.local/bin
just --list      # list all recipes

The Svelte UI is bundled into the ought binary at compile time via rust-embed, so the UI must be built before any cargo command — just handles that ordering for you. See CONTRIBUTING.md for the contributor agreement.

License

MIT

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

aishfenton

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

Apr 25, 2026

0.2.0

Apr 25, 2026

0.2.0rc3 pre-release

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ought-0.2.1-py3-none-win_amd64.whl (3.5 MB view details)

Uploaded Apr 25, 2026 Python 3Windows x86-64

ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl (3.5 MB view details)

Uploaded Apr 25, 2026 Python 3manylinux: glibc 2.28+ ARM64

ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded Apr 25, 2026 Python 3manylinux: glibc 2.17+ x86-64

ought-0.2.1-py3-none-macosx_11_0_arm64.whl (3.4 MB view details)

Uploaded Apr 25, 2026 Python 3macOS 11.0+ ARM64

ought-0.2.1-py3-none-macosx_10_12_x86_64.whl (3.5 MB view details)

Uploaded Apr 25, 2026 Python 3macOS 10.12+ x86-64

File details

Details for the file ought-0.2.1-py3-none-win_amd64.whl.

File metadata

Download URL: ought-0.2.1-py3-none-win_amd64.whl
Upload date: Apr 25, 2026
Size: 3.5 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.13.1

File hashes

Hashes for ought-0.2.1-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`616dc162444794a63270463cf3802cbda1f10dfad93c5b3b4e6893f2472738d8`
MD5	`c710e445b507b92651cd1fd062ea882a`
BLAKE2b-256	`b0ba3f3e67c894a97eecc404893af4ca83b0a9b869f1df20e42b0f9a6f8bdb6e`

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl.

File metadata

Download URL: ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl
Upload date: Apr 25, 2026
Size: 3.5 MB
Tags: Python 3, manylinux: glibc 2.28+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.13.1

File hashes

Hashes for ought-0.2.1-py3-none-manylinux_2_28_aarch64.whl
Algorithm	Hash digest
SHA256	`f2392f0c762121138bedd6c6af12b08ae7a4fda3d011b033d9f72d1d7d96dfce`
MD5	`80cec8620a3785d3f4ce69fa412caaca`
BLAKE2b-256	`37b25d598764db2bde24b52a64ffa545ed19ccc7a24d31720196639c30bdbb93`

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Apr 25, 2026
Size: 3.7 MB
Tags: Python 3, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.13.1

File hashes

Hashes for ought-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`da8092cbfe45ddabe32b632f6337b0cb08f0770174e99da2ca34b8c683d92e2b`
MD5	`91d8008c2b606e5a8220361fbb060f7c`
BLAKE2b-256	`93745102aff9089b95062492488185138b079cbcbb1a3f56e14244dad4024cb2`

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: ought-0.2.1-py3-none-macosx_11_0_arm64.whl
Upload date: Apr 25, 2026
Size: 3.4 MB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.13.1

File hashes

Hashes for ought-0.2.1-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`d3ae430fd485f50481a1fc1991d22427a3b286ff7da63bbc524397dda5e61864`
MD5	`e089b77dee47e12987a2332fab243b59`
BLAKE2b-256	`472f1287d52229fe97179587c2804562a0fcce30674268e4b0c914b947e377e8`

See more details on using hashes here.

File details

Details for the file ought-0.2.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

Download URL: ought-0.2.1-py3-none-macosx_10_12_x86_64.whl
Upload date: Apr 25, 2026
Size: 3.5 MB
Tags: Python 3, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.13.1

File hashes

Hashes for ought-0.2.1-py3-none-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`d3e08c97d3d83670be2ddaa8fbb3b79f1df9ee4dafc17ff5429261a6bb17b99c`
MD5	`a1ad829a3c8fac38dd8c949303b3446b`
BLAKE2b-256	`3b5204cf3b2cb2ab3a6faba2bf6a91a6143e03415b592c87211dbfa7df2efb8d`

See more details on using hashes here.

ought 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ought; Behavioral specs that test themselves

The Problem

What Ought Does

Quick Start

Install

Initialize

Write a spec

Generate tests

Run

Spec Format

Keywords Reference

Standard obligations

Deontic extensions

CLI Reference

How It Works

LLM Providers

Analysis Commands

MCP Server

Configuration

Philosophy

Contributing

Building from source

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes