Skip to main content

An SMT-based verifier for Python with a machine-checked trust base.

Project description

Touchstone

An SMT-based verifier for Python. Touchstone takes a function and a property and returns PROVED (it holds for all inputs), REFUTED (with a counterexample), or UNKNOWN (with a reason), by translating the code to Z3 rather than running it.

import touchstone as t

# state the property in Python, over the parameters and `result`
t.prove("def f(x):\n    return x + x\n", "result == 2 * x").status        # 'PROVED'

# or write the contract as decorators on the function itself
t.verify_contracts('''
@require("n >= 0")
@ensure("result == n")
def count(n):
    i = 0
    while i < n:
        i = i + 1
    return i
''').status                                                              # 'PROVED'

# or check two implementations agree on every input
t.verify_equiv("double", "f", "def f(a):\n    return a + a\n",
               "def g(a):\n    return 2 * a\n", {}).status                # 'PROVED'

Command line

The same verbs run from the shell, with the process exit status mirroring the verdict (0 PROVED, 1 REFUTED, 2 UNKNOWN) so they compose in CI:

touchstone verify count.py                       # the @require / @ensure contracts written in a file
touchstone prove  f.py --ensures 'result == x'   # a postcondition over the parameters and `result`
touchstone equiv  impl.py spec.py --func f       # two implementations agree on every input
touchstone check  d.py                           # trap freedom (and any asserts) for all inputs
touchstone infer  m.py                           # sound over-approximate types of a return and its locals

A refutation comes back with the counterexample and the path it took:

$ touchstone prove f.py --ensures 'result == x'
REFUTED  [property via verified VC generator (Rocq-extracted wpg)]
  counterexample: x=0
  trace:
    line 2: return x + 1    [x=0]
    => returns 1

What it covers

Functional equivalence and predicates; whole-function and interprocedural reasoning over control flow with multiple loops, arbitrary nesting, break and continue, statements after a loop, and any step direction; self-recursion, mutual recursion, and recursion over lists; deductive and synthesized loop invariants; abstract interpretation (interval, zone, octagon, Karr, polyhedra, machine-integer); IEEE-754 floating point total over every double, with Inf and NaN as first-class inputs, exact floor division and modulo, and sound over-approximations of sin, cos, exp, and log; arrays with quantified specifications; termination of counted loops, iteration over containers, and data-dependent loops, and cost; exceptions; rely-guarantee concurrency for all schedules and all depths; and separation logic with the frame rule, the magic wand, and inductive heap predicates.

Values carry their real types through the symbolic core; the heap models object identity, aliasing, and mutation; and index-out-of-bounds, key errors, None in arithmetic, type mismatches, and division by zero are traps that refute a totality claim. Fixed-width wraparound is checked alongside every integer proof. A property can be stated in Python over the parameters and result (prove), written as @require / @ensure decorators on the function (verify_contracts, in the style of the contracts and icontract packages), mined from the code's own assertions (check), or written directly as a Z3 predicate; a counterexample comes back with the execution trace and the path taken (explain).

Soundness

Every construct is either encoded soundly or returned as UNKNOWN with a reason, so an unsupported feature is never silently skipped or assumed away. A PROVED is confirmed by a second independent solver (cvc5) and withheld unless both agree, runs under a deterministic resource bound so identical input yields an identical verdict on every machine, and carries a reproducibility certificate.

The trust base is machine-checked in Rocq (proofs/): the operational semantics of the modeled subset; the verification-condition generator over it, proven sound and complete for straight-line assignment and conditionals and sound for while-loops carrying a syntactic invariant, trap-aware so a reachable division or modulo by zero leaves the condition undischarged; a fixed-width two's-complement integer model proven to agree with unbounded arithmetic exactly when no operation overflows; the SMT-LIB division and modulo encoding proven to refine the theory for every conforming solver; the abstract-domain transfers (extracted to OCaml and run as the engine's operators); the translation as a semantics-preserving functor; and the end-to-end theorem that a discharged verification condition implies the property under the program's semantics. SMTCoq re-checks each integer obligation's certificate inside Coq's kernel.

Both the straight-line and the loop verification-condition generators are extracted from that proof to OCaml and transcribed in the engine, and a differential audit holds each transcription byte-for-byte equal to its extraction on a random corpus, so the generator code that runs is the one proven correct in Rocq, not a separate symbolic execution. The engine discharges the loop-free integer fragment through the straight-line generator directly, trap-aware so a reachable division by zero leaves the condition undischarged. With that core verified, the random differential checks against CPython are a completeness regression that measures precision rather than the barrier against an unsound verdict.

Type inference

The same symbolic core infers the type of every function return, parameter, and variable in code, in two modes. infer_types is over-approximating and sound: the reported set of type names is guaranteed to contain the value's runtime type (trusting a parameter's annotation as a contract where one is given), or the location is left UNKNOWN when no such bound can be established, so a stated type is never narrower than the truth. emit_facts is best-effort exact in the TypeEvalPy schema and discovers its own targets: it walks the module and emits a fact at every return, parameter, and binding it finds, typing each by carrying an argument's type across the call boundary into the parameter it reaches, following a value through reassignment and the narrowing of is None / isinstance guards, and resolving container element types and dict keys through the call graph, attributes set in a constructor, decorators, and generators.

Scored by TypeEvalPy's exact matcher at full source position, including column offset and with the analysis discovering each location, name, and kind on its own, the emitted facts are matched against the runtime-observed ground truth. As an independent check, the inference is run over pure-Python standard-library modules and compared against the type CPython produces at runtime: the sound mode's reported set is confirmed to contain the runtime type at every observed location, and the exact mode is held to the same emit-and-match standard, discovering the locations itself.

Evaluation Result
TypeEvalPy micro-benchmark (emit-and-match) 807 / 868 (92.97%)
TypeEvalPy autogen suite (emit-and-match) 71,451 / 77,268 (92.47%)
CPython standard library (emit-and-match) 779 / 958 (81.3%)

Run

pip install touchstone-prover  # z3-solver and cvc5, pinned in pyproject.toml
python -m touchstone.ci      # self-tests, soundness audits, completeness regressions -> "CI OK"
python -m touchstone.examples  # one runnable example per capability, each verdict asserted
python -m touchstone         # a demonstration

The machine-checked proofs run under the Rocq 9.0 opam switch; verify_coq.sh also invokes the SMTCoq certificate check when its separate toolchain (see proofs/toolchain.lock) is present, and skips it cleanly otherwise:

eval "$(opam env --switch=rocq9)" && cd proofs && bash verify_coq.sh

Layout

touchstone/      package: core, domains, engines, theories, vcgen, audit, ci, examples (_impl is the engine)
proofs/          Rocq + SMTCoq proofs, the extracted VC generators + interval operators, verify_coq.sh
.github/         continuous integration: the audits and the proof gate on every change
pyproject.toml   package metadata and pinned Python dependencies

License

MIT. See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

touchstone_prover-0.1.1.tar.gz (264.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

touchstone_prover-0.1.1-py3-none-any.whl (269.7 kB view details)

Uploaded Python 3

File details

Details for the file touchstone_prover-0.1.1.tar.gz.

File metadata

  • Download URL: touchstone_prover-0.1.1.tar.gz
  • Upload date:
  • Size: 264.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for touchstone_prover-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4fcaad2d29f74b25e936668a3bfb7c8d478ba670595bba3eda07edf2bc21af6d
MD5 406eff0511fe6d921f468483de50b214
BLAKE2b-256 0a091b703a3b43870cc8a3104e30786e3e5bfaaa788516948a47ef26151c2b69

See more details on using hashes here.

File details

Details for the file touchstone_prover-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for touchstone_prover-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 71ce7657a8a71d734fb7a4b13b6e089030da0a78293b46570faa7e8e70e39732
MD5 da058bb3362fbfaddb17ccaf681ae2c0
BLAKE2b-256 dc4a7f53178d02e0494f3c29eb8ac93c3dbc8baa8a0b941570a845cfc725c0cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page