Skip to main content

An SMT-based verifier for Python with a machine-checked trust base.

Project description

Touchstone

An SMT-based verifier for Python. Touchstone takes a function and a property and returns PROVED (it holds for all inputs), REFUTED (with a counterexample), or UNKNOWN (with a reason), by translating the code to Z3 rather than running it.

import touchstone as t

# state the property in Python, over the parameters and `result`
t.prove("def f(x):\n    return x + x\n", "result == 2 * x").status        # 'PROVED'

# or write the contract as decorators on the function itself
t.verify_contracts('''
@require("n >= 0")
@ensure("result == n")
def count(n):
    i = 0
    while i < n:
        i = i + 1
    return i
''').status                                                              # 'PROVED'

# or check two implementations agree on every input
t.verify_equiv("double", "f", "def f(a):\n    return a + a\n",
               "def g(a):\n    return 2 * a\n", {}).status                # 'PROVED'

Command line

The same verbs run from the shell, with the process exit status mirroring the verdict (0 PROVED, 1 REFUTED, 2 UNKNOWN) so they compose in CI:

touchstone verify count.py                       # the @require / @ensure contracts written in a file
touchstone prove  f.py --ensures 'result == x'   # a postcondition over the parameters and `result`
touchstone equiv  impl.py spec.py --func f       # two implementations agree on every input
touchstone check  d.py                           # trap freedom (and any asserts) for all inputs
touchstone infer  m.py                           # sound over-approximate types of a return and its locals

A refutation comes back with the counterexample and the path it took:

$ touchstone prove f.py --ensures 'result == x'
REFUTED  [property via verified VC generator (Rocq-extracted wpg)]
  counterexample: x=0
  trace:
    line 2: return x + 1    [x=0]
    => returns 1

What it covers

Functional equivalence and predicates; whole-function and interprocedural reasoning over control flow with multiple loops, arbitrary nesting, break and continue, statements after a loop, and any step direction; self-recursion, mutual recursion, and recursion over lists; deductive and synthesized loop invariants; abstract interpretation (interval, zone, octagon, Karr, polyhedra, machine-integer); IEEE-754 floating point total over every double, with Inf and NaN as first-class inputs, exact floor division and modulo, and sound over-approximations of sin, cos, exp, and log; arrays with quantified specifications; termination of counted loops, iteration over containers, and data-dependent loops, and cost; exceptions; rely-guarantee concurrency for all schedules and all depths; and separation logic with the frame rule, the magic wand, and inductive heap predicates.

Values carry their real types through the symbolic core; the heap models object identity, aliasing, and mutation; and index-out-of-bounds, key errors, None in arithmetic, type mismatches, and division by zero are traps that refute a totality claim. Fixed-width wraparound is checked alongside every integer proof. A property can be stated in Python over the parameters and result (prove), written as @require / @ensure decorators on the function (verify_contracts, in the style of the contracts and icontract packages), mined from the code's own assertions (check), or written directly as a Z3 predicate; a counterexample comes back with the execution trace and the path taken (explain).

Soundness

Every construct is either encoded soundly or returned as UNKNOWN with a reason, so an unsupported feature is never silently skipped or assumed away. A PROVED is confirmed by a second independent solver (cvc5) and withheld unless both agree, runs under a deterministic resource bound so identical input yields an identical verdict on every machine, and carries a reproducibility certificate.

The trust base is machine-checked in Rocq (proofs/): the operational semantics of the modeled subset; the verification-condition generator over it, proven sound and complete for straight-line assignment and conditionals and sound for while-loops carrying a syntactic invariant, trap-aware so a reachable division or modulo by zero leaves the condition undischarged; a fixed-width two's-complement integer model proven to agree with unbounded arithmetic exactly when no operation overflows; the SMT-LIB division and modulo encoding proven to refine the theory for every conforming solver; the abstract-domain transfers (extracted to OCaml and run as the engine's operators); the translation as a semantics-preserving functor; and the end-to-end theorem that a discharged verification condition implies the property under the program's semantics. SMTCoq re-checks each integer obligation's certificate inside Coq's kernel.

Both the straight-line and the loop verification-condition generators are extracted from that proof to OCaml and transcribed in the engine, and a differential audit holds each transcription byte-for-byte equal to its extraction on a random corpus, so the generator code that runs is the one proven correct in Rocq, not a separate symbolic execution. The engine discharges the loop-free integer fragment through the straight-line generator directly, trap-aware so a reachable division by zero leaves the condition undischarged. With that core verified, the random differential checks against CPython are a completeness regression that measures precision rather than the barrier against an unsound verdict.

Type inference

The same symbolic core infers the type of every function return, parameter, and variable in unannotated code, in two modes. infer_types is over-approximating and sound: the reported set of type names is guaranteed to contain the value's runtime type, or the location is left UNKNOWN when no such bound can be established, so a stated type is never narrower than the truth. emit_facts is best-effort exact in the TypeEvalPy schema and discovers its own targets: it walks the module and emits a fact at every return, parameter, and binding it finds, typing each by carrying an argument's type across the call boundary into the parameter it reaches, following a value through reassignment and the narrowing of is None / isinstance guards, and resolving container element types and dict keys through the call graph, attributes set in a constructor, decorators, and generators.

Scored by TypeEvalPy's exact matcher at full source position, including column offset and with the analysis discovering each location, name, and kind on its own, the emitted facts are matched against the runtime-observed ground truth. As an independent check, the inference is run over pure-Python standard-library modules and compared against the type CPython produces at runtime: the sound mode's reported set is confirmed to contain the runtime type at every observed location, and the exact mode is held to the same emit-and-match standard, discovering the locations itself.

Evaluation Result
TypeEvalPy micro-benchmark (emit-and-match) 807 / 868 (92.97%)
TypeEvalPy autogen suite (emit-and-match) 71,451 / 77,268 (92.47%)
CPython standard library (emit-and-match) 770 / 958 (80.4%)

Run

pip install touchstone-prover  # z3-solver and cvc5, pinned in pyproject.toml
python -m touchstone.ci      # self-tests, soundness audits, completeness regressions -> "CI OK"
python -m touchstone.examples  # one runnable example per capability, each verdict asserted
python -m touchstone         # a demonstration

The machine-checked proofs run under the Rocq 9.0 opam switch; verify_coq.sh also invokes the SMTCoq certificate check when its separate toolchain (see proofs/toolchain.lock) is present, and skips it cleanly otherwise:

eval "$(opam env --switch=rocq9)" && cd proofs && bash verify_coq.sh

Layout

touchstone/      package: core, domains, engines, theories, vcgen, audit, ci, examples (_impl is the engine)
proofs/          Rocq + SMTCoq proofs, the extracted VC generators + interval operators, verify_coq.sh
.github/         continuous integration: the audits and the proof gate on every change
pyproject.toml   package metadata and pinned Python dependencies

License

MIT. See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

touchstone_prover-0.1.0.tar.gz (263.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

touchstone_prover-0.1.0-py3-none-any.whl (269.1 kB view details)

Uploaded Python 3

File details

Details for the file touchstone_prover-0.1.0.tar.gz.

File metadata

  • Download URL: touchstone_prover-0.1.0.tar.gz
  • Upload date:
  • Size: 263.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for touchstone_prover-0.1.0.tar.gz
Algorithm Hash digest
SHA256 86960b5390f7784156f2d3bc0cc50adc485dcd94234719e81938d86134457111
MD5 26baeeb522367b32f9d3ca7015b109db
BLAKE2b-256 cd01f91351408aa270708e5e9012340b352c0065a185666ab2de2bdbd8f152b5

See more details on using hashes here.

File details

Details for the file touchstone_prover-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for touchstone_prover-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b3bd140bd415fac6efef46bccfd39bcb76d50914ce09c02ed1d11105ca1d4ec
MD5 bb01cab0d7e7c46aed3f1be0b62eeec9
BLAKE2b-256 69ef6f26d38865252b97eac2255d5f50954c4174ecba434126ad33ee4e6bbf83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page