An SMT-based verifier for Python with a machine-checked trust base.
Project description
Touchstone
An SMT-based verifier for Python. Touchstone takes a function and a property and returns PROVED (it holds for all inputs), REFUTED (with a counterexample), or UNKNOWN (with a reason), by translating the code to Z3 rather than running it.
import touchstone as t
# state the property in Python, over the parameters and `result`
t.prove("def f(x):\n return x + x\n", "result == 2 * x").status # 'PROVED'
# or write the contract as decorators on the function itself
t.verify_contracts('''
@require("n >= 0")
@ensure("result == n")
def count(n):
i = 0
while i < n:
i = i + 1
return i
''').status # 'PROVED'
# or check two implementations agree on every input
t.verify_equiv("double", "f", "def f(a):\n return a + a\n",
"def g(a):\n return 2 * a\n", {}).status # 'PROVED'
# or verify a whole program, resolving imports across modules through the call graph
t.check_program({
"lib": "def recip(n):\n return 100 // n\n", # traps when n == 0
"main": "import lib\ndef f(n):\n if n == 0:\n return 0\n return lib.recip(n)\n",
}, "main", "f").status # 'PROVED' (the guard covers the callee)
Command line
The same verbs run from the shell, with the process exit status mirroring the verdict (0 PROVED, 1 REFUTED, 2 UNKNOWN) so they compose in CI:
touchstone verify count.py # the @require / @ensure contracts written in a file
touchstone prove f.py --ensures 'result == x' # a postcondition over the parameters and `result`
touchstone equiv impl.py spec.py --func f # two implementations agree on every input
touchstone check d.py # trap freedom (and any asserts) for all inputs
touchstone infer m.py # sound over-approximate types of a return and its locals
A refutation comes back with the counterexample and the path it took:
$ touchstone prove f.py --ensures 'result == x'
REFUTED [property via verified VC generator (Rocq-extracted wpg)]
counterexample: x=0
trace:
line 2: return x + 1 [x=0]
=> returns 1
What it covers
Functional equivalence and predicates; whole-function and interprocedural reasoning over
control flow with multiple loops, arbitrary nesting, break and continue, statements after a
loop, and any step direction; self-recursion, mutual recursion, and recursion over lists;
whole-program verification across module boundaries, resolving from-imports, qualified m.f()
calls, and aliases through the call graph so a callee's trap is seen at the caller;
deductive and synthesized loop invariants; abstract interpretation (interval, zone, octagon,
Karr, polyhedra, machine-integer); IEEE-754 floating point total over every double, with Inf
and NaN as first-class inputs, exact floor division and modulo, and sound over-approximations
of sin, cos, exp, and log; arrays with quantified specifications; termination of counted
loops, iteration over containers, and data-dependent loops, and cost; exceptions;
rely-guarantee concurrency for all schedules and all depths; and separation logic with the
frame rule, the magic wand, and inductive heap predicates.
Values carry their real types through the symbolic core; the heap models object identity,
aliasing, and mutation; a sequence index is verified against the container's length and a dict
key against the keys provably present (from a membership guard, an iteration, or a prior store),
so a guarded access is proved safe and an unguarded one refuted with the out-of-range or
missing-key witness; and these, with None in arithmetic, type mismatches, and division by zero,
are traps that refute a totality claim. Fixed-width
wraparound is checked alongside every integer proof. A property can be stated in Python over
the parameters and result (prove), written as @require / @ensure decorators on the
function (verify_contracts, in the style of the contracts and icontract packages), mined from
the code's own assertions (check), or written directly as a Z3 predicate; a counterexample
comes back with the execution trace and the path taken (explain).
SUBSET.md specifies the modeled subset precisely, in three tiers of trust, and enumerates the constructs that return UNKNOWN (async, metaclasses, non-identity decorators, dynamic reflection, and the rest), so it is clear in advance where a verdict will be withheld.
Soundness
Every construct is either encoded soundly or returned as UNKNOWN with a reason, so an
unsupported feature is never silently skipped or assumed away. A PROVED is confirmed by a
second independent solver (cvc5) and withheld unless both agree, runs under a deterministic
resource bound so identical input yields an identical verdict on every machine, and carries a
reproducibility certificate. proof_bundle exports that certificate as a re-checkable bundle -- the
discharged SMT-LIB refutation queries, the solver versions, the deterministic configuration, and a content
hash -- and recheck_bundle (or any SMT solver) re-verifies it independently, trusting the re-run solver
rather than the engine that produced it.
prove, verify, and check establish partial correctness: a PROVED means that on every input
meeting the precondition the function returns without a trap and the postcondition holds, not that it
terminates -- termination is a separate obligation (verify_total, or check --total for trap freedom
plus termination). When a true property exceeds the default deterministic bound and comes back UNKNOWN,
a larger budget is available (touchstone ... --budget high), so that incompleteness is not invisible.
The trust base is machine-checked in Rocq (proofs/): the operational semantics of the
modeled subset; the verification-condition generator over it, proven sound and complete for
straight-line assignment and conditionals and sound for while-loops carrying a syntactic
invariant, trap-aware so a reachable division or modulo by zero leaves the condition
undischarged; a fixed-width two's-complement integer model proven to agree with unbounded
arithmetic exactly when no operation overflows; the SMT-LIB division and modulo encoding
proven to refine the theory for every conforming solver; the abstract-domain transfers
(extracted to OCaml and run as the engine's operators); the translation as a
semantics-preserving functor; and the end-to-end theorem that a discharged verification
condition implies the property under the program's semantics. SMTCoq re-checks each integer
obligation's certificate inside Coq's kernel.
Both the straight-line and the loop verification-condition generators are extracted from that proof to OCaml and transcribed in the engine, and a differential audit holds each transcription byte-for-byte equal to its extraction on a random corpus, so the generator code that runs is the one proven correct in Rocq, not a separate symbolic execution. The engine discharges the loop-free integer fragment through the straight-line generator directly, trap-aware so a reachable division by zero leaves the condition undischarged. With that core verified, the random differential checks against CPython are a completeness regression that measures precision rather than the barrier against an unsound verdict.
Type inference
The same symbolic core infers the type of every function return, parameter, and variable in
code, in two modes. infer_types is over-approximating and sound: the reported set of type
names is guaranteed to contain the value's runtime type (trusting a parameter's annotation as
a contract where one is given), or the location is left UNKNOWN when no such bound can be
established, so a stated type is never narrower than the truth. emit_facts is best-effort exact in the TypeEvalPy schema and discovers its own
targets: it walks the module and emits a fact at every return, parameter, and binding it
finds, typing each by carrying an argument's type across the call boundary into the parameter
it reaches, following a value through reassignment and the narrowing of is None /
isinstance guards, and resolving container element types and dict keys through the call
graph, attributes set in a constructor, decorators, and generators.
Scored by TypeEvalPy's exact matcher at full source position, including column offset and with the analysis discovering each location, name, and kind on its own, the emitted facts are matched against the runtime-observed ground truth. As an independent check, the inference is run over pure-Python standard-library modules and compared against the type CPython produces at runtime: the sound mode's reported set is confirmed to contain the runtime type at every observed location, and the exact mode is held to the same emit-and-match standard, discovering the locations itself.
Recall is the fraction of ground-truth facts an emitted fact matches; precision the fraction of
emitted facts that match one. The CPython cross-check knows a runtime type only at the locations its
corpus reached, so its precision is measured over the emitted facts at those locations.
python -m touchstone.typeeval runs the CPython cross-check; python -m touchstone.typeeval PATH
scores a TypeEvalPy directory.
The commit rate is the fraction of ground-truth locations where the heuristic emits any type rather than abstaining (the refusal rate is its complement), and the accuracy where committed is the fraction of those commitments that match. Recall is the product of the two, so it rises either by committing on more locations or by being right more often where it commits; tracking them apart shows which lever is short.
| Evaluation | Recall | Precision | Commit | Accuracy (committed) |
|---|---|---|---|---|
| TypeEvalPy micro-benchmark (emit-and-match) | 803 / 868 (92.5%) | 36.1% | 95.7% | 96.6% |
| TypeEvalPy autogen suite (emit-and-match) | 73,401 / 77,268 (95.0%) | 49.4% | 95.4% | 99.5% |
| CPython standard library (emit-and-match) | 1069 / 1218 (87.8%) | 89.6% | 93.1% | 94.3% |
The sound mode reports a guaranteed bound rather than a single prediction, so it is measured differently:
the commit rate is the fraction of locations where it gives a bound at all, the soundness the fraction of
those whose bound contains the runtime type, and the exact rate where the bound equals the observed type.
python -m touchstone.typeeval --sound reproduces it.
| Sound mode (infer_types) | Commit rate | Soundness | Exact |
|---|---|---|---|
| TypeEvalPy micro-benchmark | 101 / 868 (11.6%) | 100% | 99.0% |
| TypeEvalPy autogen suite | 13,829 / 77,268 (17.9%) | 100% | 98.5% |
| CPython standard library | 54 / 1218 (4.4%) | 100% | 83.3% |
The sound mode commits only where the type is fixed independent of the inputs (a literal, a fixed-return builtin, an operation over already-bounded values), so its reach is narrower than the heuristic's; but a committed bound is a proven over-approximation, and the runtime type fell inside every one.
The TypeEvalPy figures are scored against commit 3719de1 of
TypeEvalPy, and the CPython cross-check ran
on CPython 3.13.2; both the counts and the rates move with the benchmark commit and the standard-library
version, so the tables are snapshots python -m touchstone.typeeval reproduces rather than fixed scores.
The autogen suite is generated from templates the heuristic was tuned against and the micro suite is
hand-written, whereas the CPython standard library is independent of both: it is the held-out
measurement, and its lower accuracy is the out-of-distribution result the benchmark numbers do not show
on their own.
Run
pip install touchstone-prover # z3-solver and cvc5, pinned in pyproject.toml
python -m touchstone.ci # self-tests, soundness audits, completeness regressions -> "CI OK"
python -m touchstone.examples # one runnable example per capability, each verdict asserted
python -m touchstone.typeeval # type-inference recall + precision (CPython cross-check)
python -m touchstone # a demonstration
The machine-checked proofs run under the Rocq 9.0 opam switch; verify_coq.sh also invokes
the SMTCoq certificate check when its separate toolchain (see proofs/toolchain.lock) is
present, and skips it cleanly otherwise:
eval "$(opam env --switch=rocq9)" && cd proofs && bash verify_coq.sh
Layout
touchstone/ package: core, domains, engines, theories, vcgen, audit, ci, examples (_impl is the engine)
proofs/ Rocq + SMTCoq proofs, the extracted VC generators + interval operators, verify_coq.sh
.github/ continuous integration: the audits and the proof gate on every change
pyproject.toml package metadata and pinned Python dependencies
SUBSET.md the modeled subset of Python, in three tiers of trust, and what returns UNKNOWN
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file touchstone_prover-0.4.4.tar.gz.
File metadata
- Download URL: touchstone_prover-0.4.4.tar.gz
- Upload date:
- Size: 321.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9618ed56e9e54869f6c0c6a4fc8f5e31ab660d014501eec144e01ec045d5ef5
|
|
| MD5 |
19357279422dbb7960176430364707ec
|
|
| BLAKE2b-256 |
ad1804a404393378c07ab6dd74992d28bdfc7375920e5088fcfe5ec1f208427e
|
Provenance
The following attestation bundles were made for touchstone_prover-0.4.4.tar.gz:
Publisher:
publish.yml on CharlesCNorton/touchstone
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
touchstone_prover-0.4.4.tar.gz -
Subject digest:
d9618ed56e9e54869f6c0c6a4fc8f5e31ab660d014501eec144e01ec045d5ef5 - Sigstore transparency entry: 1884660276
- Sigstore integration time:
-
Permalink:
CharlesCNorton/touchstone@032583d387fadee8ed1724586e0c16c0156f9c2d -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/CharlesCNorton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@032583d387fadee8ed1724586e0c16c0156f9c2d -
Trigger Event:
push
-
Statement type:
File details
Details for the file touchstone_prover-0.4.4-py3-none-any.whl.
File metadata
- Download URL: touchstone_prover-0.4.4-py3-none-any.whl
- Upload date:
- Size: 329.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29b532332d49abbcb57d3c479c0c128d2b8f538c0c00f663a6aeabb458920e20
|
|
| MD5 |
34b7b8757a5823c9fc4ec699e348ed69
|
|
| BLAKE2b-256 |
056db42e0f26868898dbac51726e95f3f4f8fda810f21ff180d389d38a97c082
|
Provenance
The following attestation bundles were made for touchstone_prover-0.4.4-py3-none-any.whl:
Publisher:
publish.yml on CharlesCNorton/touchstone
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
touchstone_prover-0.4.4-py3-none-any.whl -
Subject digest:
29b532332d49abbcb57d3c479c0c128d2b8f538c0c00f663a6aeabb458920e20 - Sigstore transparency entry: 1884660378
- Sigstore integration time:
-
Permalink:
CharlesCNorton/touchstone@032583d387fadee8ed1724586e0c16c0156f9c2d -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/CharlesCNorton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@032583d387fadee8ed1724586e0c16c0156f9c2d -
Trigger Event:
push
-
Statement type: