Skip to main content

Sound behavior-equivalence verification for refactors, using your own tests for inputs

Project description

Selfsame

CI PyPI Python License: MIT

Know whether your code still behaves the same — before, after, and across every AI edit.

Selfsame is a sound behavior checker for Python. It captures the real arguments your tests (or app) feed your code, replays two versions in isolated subprocesses, and compares the results structurally. Use it to prove a refactor didn't change behavior — or to catch the silent regressions that creep in when an AI agent ships features all day and "a new feature works, but the old ones quietly broke."

The one promise: zero false confidence

Selfsame never says equivalent when behavior actually differs, and never says divergent when it doesn't. When it can't be sure, it refuses (unverifiable) instead of guessing. A green result means green.

  • 🧪 Inputs are real, not generated — recorded from your own test suite or app run. No type hints required; methods, packages, and relative imports just work.
  • 🔒 Sound by construction — uncontrolled I/O, threads, nondeterminism, and opaque values are refused, never certified.
  • 🤖 Built for AI-driven development — freeze an accepted build, then measure how far each generated change drifts from it. No second git branch needed.
  • 🎯 Proves assumptions are load-bearing (experimental)adjudicate violates a nominated dependency boundary and shows whether the passing result secretly depended on it. A judge, not a guesser.
  • 📄 Agent-consumable reports — every run drops .selfsame/report.json + Markdown with file:line, before→after witnesses, and what was not covered.
  • 🪶 Pure standard library — no runtime dependencies. pip install and go.

Install

pip install selfsame        # or: pipx install selfsame · uv tool install selfsame

Installs the selfsame command (probe is a kept alias). Python 3.8+.

60-second start

Did my refactor change behavior? (inputs come from your existing tests)

selfsame verify --base main --modules mypkg -- pytest -q
  parse_args                     n=11   equivalent
X slugify                        n=102  divergent     @ input #0
      input : ('Café', max_length=3)
      base  : 'caf'
      head  : 'caf-'
      minimized: ('ab', max_length=1)

Sound auto-verify : 3/4 = 100%
  ** 1 DIVERGENCE(S): behavior changed at a tested input **
selfsame: 3 equivalent · 1 divergent · 0 unverifiable  →  .selfsame/report.json

Exit code is non-zero on any divergence, so drop it straight into CI.

The AI use case: catch regressions against a confirmed build

When an AI agent generates code continuously, you rarely have a clean "before" branch — you have a build you accepted and whatever the next feature did to it. Freeze the accepted behavior once, then check drift after every change:

# 1. You confirm a build works. Freeze its behavior as the baseline.
selfsame snapshot --modules myapp -- pytest -q

# 2. The agent develops the next feature (adds code, edits existing code)...

# 3. How much of the accepted behavior changed?
selfsame drift          # exit 1 if anything deviated → blocks the bad build

A worked example — the agent adds a feature and accidentally breaks an existing function:

~ discount          n=2   interface-change   (added optional param 'currency' — back-compatible)
X greet             n=1   divergent          base 'Hello, Sam!'  →  head 'Hi, Sam'     ← regression caught
  total             n=1   equivalent         (rewritten as a loop — behavior preserved)
# new_helper: flagged separately as changed code with no test baseline

The signal scales with behavior that actually changed, not lines of code: brand-new code has no baseline (no noise), behavior-preserving rewrites stay equivalent, and only real deviations at tested inputs are flagged. Make it automatic — pytest becomes your regression gate:

# pyproject.toml  ·  [tool.pytest.ini_options]  (or pytest.ini)
[pytest]
selfsame = true     # the plugin runs a compare-only drift check after the suite

The plugin is compare-only: it never re-baselines on its own, so a regression can't silently become the new "correct" behavior — you bless a new baseline explicitly with selfsame snapshot.

👉 Full walkthrough: docs/ai-workflows.md

Commands at a glance

command what it does
selfsame verify replay base vs head on your test inputs; per-function verdict + CI exit code
selfsame snapshot freeze the current (accepted) build's behavior to a baseline file
selfsame drift measure how much current code deviated from the baseline (no second branch)
selfsame capture record real call arguments from any test or app command
selfsame replay replay captured arguments across two git refs
selfsame attach dump captures from a running, hook-enabled process without stopping it
selfsame check generate inputs and check two files / git refs (for typed, pure functions)
selfsame fuzz (experimental) mutate real inputs to find divergences your tests miss
selfsame adjudicate (experimental) prove whether a nominated assumption is load-bearing on passing code

Full reference with every flag: docs/commands.md.

Documentation

🚀 Getting started install, your first verify, your first snapshot/drift
🤖 AI workflows snapshot/drift, the pytest plugin, agent reports, working at LLM velocity
📖 Command reference every command and flag
⚙️ Configuration [tool.selfsame], environment variables, exit codes
🛠️ How it works capture → replay → compare, and the soundness model
🧭 Limitations the honest boundaries — read before you rely on it
📐 Architecture & spec the engineering contract — data formats, canonical schema, verdict model (for contributors)

Project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selfsame-0.3.0.tar.gz (94.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selfsame-0.3.0-py3-none-any.whl (88.6 kB view details)

Uploaded Python 3

File details

Details for the file selfsame-0.3.0.tar.gz.

File metadata

  • Download URL: selfsame-0.3.0.tar.gz
  • Upload date:
  • Size: 94.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for selfsame-0.3.0.tar.gz
Algorithm Hash digest
SHA256 67a849079e908f90eb1763eab9d208402795813a007fc99b79b7f86f12e048ca
MD5 c27b71002d5117a514abcd6851efae3c
BLAKE2b-256 70d02bed8e493e601c7e8513f9fcb3dae7381a038e71e775e9f50d2c2c9f0fe4

See more details on using hashes here.

Provenance

The following attestation bundles were made for selfsame-0.3.0.tar.gz:

Publisher: release.yml on PraveenKPandu/Selfsame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file selfsame-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: selfsame-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 88.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for selfsame-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83c351552bc52fe180ad6be7898fd4669717bfecf098af49f2ffc8c15a08221f
MD5 c1e8ccb958e1c8e6d3193559470b69f7
BLAKE2b-256 dfefb4309880be4d9da6288e7c49d63186858e7a2508e611990d59b4939def7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for selfsame-0.3.0-py3-none-any.whl:

Publisher: release.yml on PraveenKPandu/Selfsame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page