Skip to main content

Specwright - Python framework for LLM-assisted development with runtime spec validation

Project description

Specwright

Documentation PyPI Python Tests Lint codecov License Code style: black

A specification-first framework for LLM-assisted development.

Humans write specifications and constraints. LLMs write implementations. Specwright enforces that implementations satisfy specifications — at decoration time and at runtime.

Read the full documentation at specwright.org

Quick Start

from specwright import spec

@spec
def add(x: int, y: int) -> int:
    """Add two integers."""
    return x + y

add(1, 2)        # 3
add("one", 2)    # raises InputValidationError

Every @spec-decorated function must have:

  • Complete type annotations on all parameters and the return type
  • A docstring describing its behavior

Specwright validates these at decoration time. At runtime, it checks that actual arguments and return values match the declared types.

Installation

pip install llmcraft

Or with Poetry:

poetry add llmcraft

Note: The PyPI package name is llmcraft, but you import it as specwright.

Requires Python 3.11+

Features

Runtime Type Validation

@spec
def divide(a: float, b: float) -> float:
    """Divide a by b."""
    return a / b

divide(10, 3)        # 3.333...
divide("ten", 3)     # InputValidationError with clear message

Complex Type Support

Works with generics, unions, and Pydantic models:

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

@spec
def find_adults(users: list[User]) -> list[str]:
    """Return names of users aged 18+."""
    return [u.name for u in users if u.age >= 18]

Spec Metadata for Doc Generation

Every decorated function carries machine-readable metadata:

@spec
def greet(name: str) -> str:
    """Greet someone by name."""
    return f"Hello, {name}!"

meta = greet.__spec__
meta.name          # "greet"
meta.parameters    # {"name": <class 'str'>}
meta.return_type   # <class 'str'>
meta.docstring     # "Greet someone by name."

Configurable Enforcement

Turn off specific checks when you need to:

@spec(validate_output=False)
def flexible_return(x: int) -> str:
    """May not return a string during development."""
    return x  # no error raised

Declarative Error Handling

The @handle_errors decorator maps exception types to handling strategies:

from specwright import handle_errors

@handle_errors({
    ValueError: "ignore",                      # suppress, return None
    KeyError: lambda e: f"missing: {e}",       # custom handler
    RuntimeError: "log",                       # log with traceback, re-raise
    ConnectionError: {"error": "offline"},      # return a fallback value
})
def process(data: dict) -> str:
    ...

Strategies:

Strategy Behaviour
"ignore" Suppress the exception, return None
"log" Log with full traceback, then re-raise
callable Call handler(exception), return its result
any other value Return that value directly

Combining @spec and @handle_errors

The two decorators compose naturally. Place @handle_errors on the outside to catch exceptions that escape the spec-validated function:

from specwright import spec, handle_errors

@handle_errors({
    ValueError: lambda e: {"error": str(e)},
    KeyError: "ignore",
})
@spec
def get_user(user_id: int) -> dict:
    """Look up a user by ID."""
    if user_id < 0:
        raise ValueError("user_id must be non-negative")
    return USERS[user_id]

get_user(1)        # {"name": "Alice", ...}
get_user(-1)       # {"error": "user_id must be non-negative"}
get_user(999)      # None (KeyError ignored)
get_user("bad")    # raises InputValidationError (not in handlers)

Or place @spec on the outside to validate the fallback return values too:

@spec
@handle_errors({ValueError: 0})
def parse_int(s: str) -> int:
    """Parse a string to int, defaulting to 0."""
    return int(s)

parse_int("abc")   # 0 (fallback passes int type check)

State Machine

The StateMachine base class enforces valid state transitions at runtime:

from specwright import StateMachine, transition

class OrderProcessor(StateMachine):
    states = ["pending", "paid", "shipped", "delivered", "cancelled"]
    initial_state = "pending"

    @transition(from_state="pending", to_state="paid")
    def pay(self, amount: float) -> str:
        return f"Paid ${amount:.2f}"

    @transition(from_state="paid", to_state="shipped")
    def ship(self, tracking: str) -> str:
        return f"Shipped ({tracking})"

    @transition(from_state=["pending", "paid"], to_state="cancelled")
    def cancel(self, reason: str) -> str:
        return f"Cancelled: {reason}"

order = OrderProcessor()
order.pay(99.99)         # state -> "paid"
order.ship("TRACK-123")  # state -> "shipped"
order.cancel("reason")   # raises InvalidTransitionError (can't cancel after shipping)

State history tracks every state visited:

class Tracked(StateMachine):
    states = ["a", "b", "c"]
    initial_state = "a"
    track_history = True  # opt-in

    @transition(from_state="a", to_state="b")
    def go_b(self): ...

    @transition(from_state="b", to_state="c")
    def go_c(self): ...

sm = Tracked()
sm.go_b()
sm.go_c()
sm.state_history  # ["a", "b", "c"]

Lifecycle hooks run automatically on state changes:

class WithHooks(StateMachine):
    states = ["active", "suspended"]
    initial_state = "active"

    @transition(from_state="active", to_state="suspended")
    def suspend(self): ...

    def on_exit_active(self):
        print("Leaving active state")

    def on_enter_suspended(self):
        print("Entering suspended state")

Combines with @spec for full validation:

class Machine(StateMachine):
    states = ["idle", "done"]
    initial_state = "idle"

    @transition(from_state="idle", to_state="done")
    @spec
    def finish(self, result: str) -> str:
        """Complete the task."""
        return f"done: {result}"

State machines help LLMs by making valid transitions explicit and machine-readable. An LLM can see exactly which states exist, which transitions are allowed, and what the current state is — eliminating an entire class of bugs where code attempts an impossible operation.

Test Requirements (@requires_tests)

Declare what tests a function needs — the pytest plugin enforces it:

from specwright import requires_tests, spec

@requires_tests(
    happy_path=True,
    edge_cases=["empty_input", "max_boundaries"],
    error_cases=["invalid_email", "negative_age"],
)
@spec
def create_user(email: str, age: int) -> dict:
    """Create a new user account."""
    ...

The decorator stores a TestRequirements object on the function:

reqs = create_user.__test_requirements__
reqs.expected_test_names
# ['test_create_user_happy_path',
#  'test_create_user_empty_input',
#  'test_create_user_max_boundaries',
#  'test_create_user_invalid_email',
#  'test_create_user_negative_age']

Naming convention: test_{function_name}_{case_name}

Pytest plugin verifies at collection time that all required test functions exist:

# pyproject.toml
[tool.pytest.ini_options]
specwright_test_enforcement = "strict"   # "strict" | "warn" | "off"
Mode Behaviour
strict Fail the session if any required tests are missing
warn Emit warnings but let the session continue
off Skip the check entirely

This enforces a test-driven LLM workflow: humans declare what must be tested, LLMs write the implementations and the tests, and the framework ensures nothing is forgotten.

Clear Error Messages

InputValidationError: Input validation failed for 'add':
  - Parameter 'x': expected <class 'int'>, got str ('one')
InvalidTransitionError: Cannot transition from 'shipped' to 'cancelled'
via 'cancel'. Valid source state(s): paid, pending

Why Specwright?

Modern development increasingly involves LLMs generating code. This creates a new problem: how do you trust LLM-generated implementations?

The traditional answer — code review — doesn't scale. Specwright takes a different approach:

  1. Humans write specs — type signatures, docstrings, constraints, and state machines that define what a function should do
  2. LLMs write implementations — the code that fulfills the spec
  3. Specwright enforces correctness — runtime validation ensures implementations actually satisfy their specifications

This creates a workflow where humans stay in control of what the software does, while delegating how it does it. The framework is the bridge that ensures the two stay in sync.

CLI

Specwright includes a CLI for scaffolding projects, generating boilerplate, validating coverage, and producing docs.

specwright init

Scaffold a new project:

specwright init my_project

Creates my_project/ with pyproject.toml, a sample @spec function, tests/, and a .specwright.toml config file.

specwright new function

Generate a @spec-decorated function and its test file:

specwright new function calculate_score \
    --params "base: int, multiplier: float" \
    --returns float

Omit --params / --returns to be prompted interactively.

specwright new statemachine

Generate a StateMachine subclass with sequential transitions:

specwright new statemachine order_processor \
    --states pending,paid,shipped,delivered

specwright validate

Check that all @spec-decorated functions have tests and state machines are well-formed:

specwright validate --path .

Exits with code 1 if issues are found.

specwright docs

Generate API documentation from @spec metadata:

specwright docs --path .                # to stdout
specwright docs --path . --output API.md   # to file
specwright docs --path . --diagram         # include DOT state diagrams

Workflow

specwright init my_project
cd my_project
specwright new function my_func
specwright new statemachine my_workflow
# ... fill in implementations ...
specwright validate
specwright docs --output API.md

Project Structure

specwright/
  __init__.py        # Public API
  cli.py             # CLI entry point (init, new, validate, docs)
  decorators.py      # @spec and @handle_errors decorators
  state_machine.py   # StateMachine base class and @transition
  testing.py         # @requires_tests decorator and TestRequirements
  pytest_plugin.py   # Pytest plugin for test enforcement
  validation.py      # Runtime type checking engine
  exceptions.py      # Clear, typed error hierarchy
  templates/         # Jinja2 templates for code generation
tests/               # Comprehensive test suite
examples/            # Runnable usage examples

Development

# Install dependencies
poetry install

# Run tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=specwright --cov-report=term-missing

# Lint
poetry run ruff check .

# Format
poetry run black .

# Type check
poetry run mypy specwright

Contributing

See CONTRIBUTING.md for development setup, code style, and PR guidelines.

License

MIT — see LICENSE for details.

Why is the package called llmcraft?

The project is called Specwright, but the PyPI package name specwright was already taken. We use llmcraft on PyPI while maintaining Specwright as our brand and import name.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmcraft-0.1.0.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmcraft-0.1.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file llmcraft-0.1.0.tar.gz.

File metadata

  • Download URL: llmcraft-0.1.0.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmcraft-0.1.0.tar.gz
Algorithm Hash digest
SHA256 44dbb9de05fa28db33eb03076d334557ae6b3cfbbd20b64b6e2097e5c18ba3b6
MD5 9e971397566c5b3e504bbd6592757ab4
BLAKE2b-256 0896a7f9ec0744eb4ce41ae5739c5701a910a1b6ac3c739722938880b27c47d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmcraft-0.1.0.tar.gz:

Publisher: publish.yml on burakdalgic/Specwright

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmcraft-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmcraft-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmcraft-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dca02713790d486de571c8612235427918e2f2cb3136a7477603a2440415e38a
MD5 87eb6395ce5fb63100c25a2f5611fbd9
BLAKE2b-256 22ef88fa753cbf359f7da75c9ec2821081bb251588c08d98c7942b43f40127fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmcraft-0.1.0-py3-none-any.whl:

Publisher: publish.yml on burakdalgic/Specwright

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page