Skip to main content

AI evidence that's harder to fake and easier to verify

Project description

Assay

AI evidence that's harder to fake and easier to verify.

Assay turns AI execution claims into portable proof artifacts a buyer can verify offline -- without trusting your server. Change one byte, verification fails. Drop a locked check, the mismatch is exposed. Skip a contracted call site, completeness checks catch it.

Assay doesn't make fraud impossible. It makes fraud expensive, fragile, and much easier to catch.

We scanned 30 popular AI projects and found 202 high-confidence LLM call sites. Zero had tamper-evident audit trails. Full results.

# macOS / Linux
python3 -m pip install assay-ai
# Windows
py -m pip install assay-ai

Requires Python 3.9+. Verify the CLI is on PATH:

assay version

If pip isn't on your PATH, use the Python launcher (python3 -m pip on macOS/Linux, py -m pip on Windows).

Prefer a deterministic setup path? Start here: docs/START_HERE.md

Boundary: Assay proves the evidence artifact has not been quietly changed after the fact. It does not, by itself, prove every upstream component was honest. Stronger deployment patterns (CI-held signing keys, transparency logs, external timestamping) raise the cost of full fabrication. See trust tiers.

Not this: Assay is not a logging framework, an observability dashboard, or a monitoring tool. It produces signed evidence bundles that a third party can verify offline. If you need Datadog, this isn't it.

See It -- Then Understand It

Try it now (no API key needed -- demos use synthetic data; with real calls, Assay instruments OpenAI, Anthropic, Gemini, LiteLLM, LangChain, and local models):

python3 -m pip install assay-ai
assay demo-challenge    # tamper detection: one valid pack, one with a single byte changed

Two packs, one byte changed ("gpt-4" -> "gpt-5" in the receipts). Here's what happens (pack IDs and timestamps will differ on your machine):

$ assay verify-pack challenge_pack/good/

  VERIFICATION PASSED

  Pack ID:    pack_20260222_ca2bb665
  Integrity:  PASS
  Claims:     PASS
  Receipts:   3
  Signature:  Ed25519 valid

  Exit code: 0

$ assay verify-pack challenge_pack/tampered/

  VERIFICATION FAILED

  Pack ID:    pack_20260222_ca2bb665
  Integrity:  FAIL
  Error:      Hash mismatch for receipt_pack.jsonl

  Exit code: 2

One byte changed. Verification fails. No server access needed. No trust required. Just math.

Now try the policy violation demo:

assay demo-incident     # two-act scenario: honest PASS vs honest FAIL
  Act 1: Agent uses gpt-4 with guardian check
  Integrity: PASS    Claims: PASS    Exit code: 0

  Act 2: Someone swaps model to gpt-3.5-turbo, removes guardian
  Integrity: PASS    Claims: FAIL    Exit code: 1

Act 2 is an honest failure -- authentic evidence proving the run violated its declared standards. The evidence is real. The failure is real. Nobody can edit the history. Exit code 1.

Honest failure is a feature, not an embarrassment. Exit 1 is audit gold: a control failed, the failure is detectable and retained, and the evidence is authentic. A signed failure is stronger evidence than a vague pass. Auditors, regulators, and buyers trust systems that can show what went wrong -- not systems that only ever claim success.

How that works

Assay separates two questions on purpose:

  • Integrity: "Were these bytes tampered with after creation?" (signatures, hashes, required files)
  • Claims: "Does this evidence satisfy our declared governance checks?" (receipt types, counts, field values)
Integrity Claims Exit Meaning
PASS PASS 0 Evidence checks out, behavior meets standards
PASS FAIL 1 Honest failure: authentic evidence of a standards violation
FAIL -- 2 Tampered evidence
-- -- 3 Bad input (missing files, invalid arguments)

The split is the point. Systems that can prove they failed honestly are more trustworthy than systems that always claim to pass.

With real calls: assay scan . finds your actual OpenAI / Anthropic / Gemini / LiteLLM / LangChain call sites. assay patch . instruments them. Every real LLM call emits a signed receipt. The demos above use synthetic data so you can see verification without configuring anything.

How Assay captures evidence

Installing Assay gives you the CLI, receipt store, and proof-pack builder. It does not automatically record your app.

Receipts are emitted only when your runtime is instrumented:

  • assay patch . inserts the right Assay integration for supported SDKs
  • patch() wrappers emit receipts when model calls happen
  • AssayCallbackHandler() does the same for LangChain callback flows
  • emit_receipt(...) lets you record events manually in any stack

assay run -- <your command> then does three things:

  1. creates a trace id
  2. runs your app with ASSAY_TRACE_ID in the environment
  3. packages any emitted receipts into proof_pack_<trace_id>/

The result is a signed, offline-verifiable artifact:

app execution
  -> instrumented SDK or emit_receipt(...)
  -> receipts written to ~/.assay/...
  -> assay run packages them into proof_pack_<trace_id>/
  -> assay verify-pack checks the artifact offline

Add to Your Project

# 1. Find uninstrumented LLM calls
assay scan . --report

# 2. Patch (one line per SDK, or auto-patch all)
assay patch .

# 3. Run + build a signed evidence pack
# -c receipt_completeness runs the built-in completeness check (see `assay cards list` for all options)
# everything after -- is your normal run command
assay run -c receipt_completeness -- python my_app.py

# 4. Verify
assay verify-pack ./proof_pack_*/

# 5. Generate report artifacts for security/compliance review
assay report . -o evidence_report.html --sarif

# 6. Optional: set and enforce score gates in CI
assay gate save-baseline
assay gate check . --min-score 60 --fail-on-regression

assay scan . --report finds every LLM call site (OpenAI, Anthropic, Google Gemini, LiteLLM, LangChain) and generates a self-contained HTML gap report. assay patch inserts the two-line integration. assay run wraps your command, collects receipts, and produces a signed 5-file evidence pack. assay verify-pack checks integrity + claims and exits with one of the four codes above. Then run assay explain on any pack for a plain-English summary.

Local models: Any OpenAI-compatible server (Ollama, LM Studio, vLLM, llama.cpp) works automatically -- Assay patches the OpenAI SDK at the class level, so OpenAI(base_url="http://localhost:11434/v1") emits receipts like any other provider. LiteLLM users get the same coverage via the LiteLLM integration (ollama/llama3, etc.).

Why now: EU AI Act Article 12 requires automatic logging for high-risk AI systems; Article 19 requires providers to retain automatically generated logs for at least 6 months. High-risk obligations apply from 2 Aug 2026 (Annex III) and 2 Aug 2027 (regulated products). SOC 2 CC7.2 requires monitoring of system components and analysis of anomalies as security events. "We have logs on our server" is not independently verifiable evidence. Assay produces evidence that is. See compliance citations for exact references.

CI Gate

Fastest path (recommended):

assay ci init github --run-command "python my_app.py" --min-score 60

This generates a 3-job GitHub Actions workflow:

  • assay-gate (score enforcement, regression checks, JSON gate report artifact)
  • assay-verify (proof pack generation + cryptographic verification)
  • assay-report (HTML evidence report artifact + SARIF upload)

Manual path (advanced):

assay gate save-baseline
assay gate check . --min-score 60 --fail-on-regression --save-report assay_gate_report.json --verbose --json
assay run -c receipt_completeness -- python my_app.py
assay verify-pack ./proof_pack_*/ --lock assay.lock --require-claim-pass
assay report . -o evidence_report.html --sarif

The lockfile catches config drift. Verify-pack catches tampering. Gate enforces score regressions. Report produces the shareable artifact + SARIF. assay diff remains useful for deep forensics and budget/drift analysis. See Decision Escrow for the protocol model.

# Lock your verification contract
assay lock write --cards receipt_completeness -o assay.lock

Daily use after CI is green

Regression forensics:

assay diff ./proof_pack_*/ --against-previous --why

--against-previous auto-discovers the baseline pack. --why traces receipt chains to explain what regressed and which call sites caused it.

Cost/latency drift (from receipts):

assay analyze --history --since 7

Shows cost, latency percentiles, error rates, and per-model breakdowns from your local trace history.

VendorQ: Verifiable Vendor Questionnaires

Enterprise customers ask AI governance questions in security questionnaires. VendorQ compiles evidence-backed answer packets from Assay proof packs. Every answer traces to a signed receipt. Every modification is detectable.

# Ingest a questionnaire, compile answers against evidence, lock, verify
assay vendorq ingest --in questionnaire.csv --out questions.json
assay vendorq compile --questions questions.json --pack ./proof_pack --out answers.json
assay vendorq lock write --answers answers.json --pack ./proof_pack --out vendorq.lock
assay vendorq verify --answers answers.json --pack ./proof_pack --lock vendorq.lock --strict

10 deterministic verification rules. Tamper one answer and verification fails with exit code 2. The packet is forwardable to your customer's security team — they verify it offline with a public key.

See it live: Proof Gallery — three real proof packs demonstrating pass, honest fail, and tamper detection. All three are independently verifiable without any account or API key.

Adversarial testing: 16 attack scenarios, 16 catches, 0 false passes.

Reviewer Packets

Reviewer Packets are the buyer-facing wrapper around a signed proof pack. Assay produces the proof pack. The Reviewer Packet makes that proof usable across an organizational boundary: settlement, scope, coverage, and the nested proof-pack verification path in one forwardable artifact.

# Compile a reviewer packet from a proof pack plus declarative packet inputs
assay vendorq export-reviewer \
  --proof-pack tests/fixtures/reviewer_packet/sample_proof_pack \
  --boundary tests/fixtures/reviewer_packet/sample_boundary.json \
  --mapping tests/fixtures/reviewer_packet/sample_mapping.json \
  --out reviewer_packet_demo

# Verify the reviewer packet and derive the settlement
assay reviewer verify reviewer_packet_demo
assay reviewer verify reviewer_packet_demo --json

Canonical handoff flow:

proof pack -> reviewer packet -> assay reviewer verify -> browser verify

Buyer verdicts and CLI exit codes are different layers:

  • Buyer verdicts: VERIFIED, VERIFIED_WITH_GAPS, INCOMPLETE_EVIDENCE, EVIDENCE_REGRESSION, TAMPERED, OUT_OF_SCOPE
  • CLI exit codes: 0/1/2/3 for PASS, HONEST_FAIL, TAMPERED, and bad input

Use the proof pack when you need kernel-level verification. Use the Reviewer Packet when another team needs a bounded artifact they can inspect, forward, and challenge.

Verify online: Browser verifier — drop in a proof pack or reviewer packet and check it client-side.

Passports: Portable Signed Evidence

A passport is a signed, content-addressed JSON object that summarizes what was verified about an AI system: claims, coverage, reliance class, and a validity window. Built from proof pack evidence, not asserted by hand.

# Mint a passport from a proof pack, sign it, verify it
assay passport mint --pack ./proof_pack/ --subject-name "MyApp" \
  --system-id "my.app.v1" --owner "My Org" --output passport.json
assay passport sign passport.json
assay passport verify passport.json

# Check reliance posture under a policy mode
assay passport status passport.json --mode buyer-safe --json

# X-Ray diagnostic: structural grade (A-F) and improvement path
assay passport xray passport.json --report xray.html

Lifecycle governance is cryptographically backed:

# Challenge a passport (any identified signer)
assay passport challenge passport.json --reason "Missing coverage"

# Supersede with a new version
assay passport supersede old.json new.json --reason "Addressed gap"

# Compare two passports — flags regressions
assay passport diff old.json new.json --report diff.html

verify answers structural validity (signature, content-addressed ID). status answers reliance posture under a configurable policy mode.

Run the full 10-step lifecycle demo:

assay passport demo

Worked example: Seeded referee gallery — pre-built signed passports, governance receipts, X-Ray diagnostic, and trust diff. All artifacts are regenerable via python3 docs/passport/generate_gallery.py.

Passport guide: See docs/passport/README.md for the bounded public story: what you can inspect today, what verify and status mean, and what remains future scope.

What this proves today:

  • Signed, content-addressed passport artifacts with Ed25519 signatures
  • Deterministic lifecycle governance: challenge, supersede, revoke, diff
  • Reproducible worked examples on seeded reference artifacts
  • Offline verification without network access

What is future scope:

  • Arbitrary external trust-surface scanning (URLs, PDFs, vendor pages)
  • Minting from external vendor documents (currently proof-pack only)
  • Generalized trust analysis across messy real-world inputs
  • Enterprise diff workflows (primitive exists, product does not)

AI Decision Credentials (ADC)

ADC is a structured schema for packaging AI decision evidence into verifiable, time-bounded credentials. An ADC wraps the proof pack with decision metadata: what was decided, by whom, under what policy, with what evidence, and how long the credential remains valid.

# Verify a pack with expiry enforcement
assay verify-pack ./proof_pack_*/ --check-expiry

# ADC v0.1 schema: 35 properties, 17 required, additionalProperties: false
# Schema: src/assay/schemas/adc_v0.1.schema.json

The conformance corpus includes 10 canonical packs (including stale_01 for expired credentials and superseded_01 for replaced decisions).

What Becomes Harder to Fake

Assay is not a truth oracle. It is an evidence-hardening layer.

If someone tries to... Without Assay With Assay
Edit evidence after a run Hard to notice Verification fails
Drop or weaken locked checks Easy to hide Lock mismatch exposes it
Omit covered call sites Easy to hand-wave Completeness checks catch it
Hand buyer internal logs, ask for trust Buyer must trust the operator Buyer verifies offline
Fabricate a complete run from scratch Possible Still possible at base tier; stronger deployment raises the cost

Why there is no quiet edit. Every file in a proof pack is fingerprinted. The fingerprints are recorded in a manifest. The manifest is digitally signed. Change a file -- the fingerprint won't match. Fix the manifest to cover it -- the signature breaks. Re-sign the manifest -- the signer identity changes. Every path to tampering leaves a visible trace.

Assay proves the evidence artifact has not been quietly changed after the fact. It does not, by itself, prove every upstream component was honest.

Deployment ladder -- start at Base, strengthen as your trust requirements grow:

  • Base -- self-signed artifact, offline-verifiable, tamper-evident
  • Hardened -- CI-held signing key + branch protection (separates signer from developer)
  • Anchored -- transparency ledger + external timestamping (RFC 3161)

Completeness is enforced relative to call sites enumerated by the scanner and/or declared by policy. Undetected call sites are a known residual risk, reduced via multi-detector scanning and CI gating.

Assay doesn't make fraud impossible -- it makes fraud expensive, fragile, and much easier to catch.

The Evidence Compiler

Assay is an evidence compiler for AI execution. If you've used a build system, you already know the mental model:

Concept Build System Assay
Source .c / .ts files Receipts (one per LLM call)
Artifact Binary / bundle Evidence pack (5 files, 1 signature)
Tests Unit / integration tests Verification (integrity + claims)
Lock package-lock.json assay.lock
Gate CI deploy check CI evidence gate

Commands

The core path is 6 commands:

assay quickstart          # discover
assay scan / assay patch  # instrument
assay run                 # produce evidence
assay verify-pack         # verify evidence
assay diff                # catch regressions
assay score               # evidence readiness (0-100, A-F)

Full command reference:

Getting started

Command Purpose
assay quickstart One command: demo + scan + next steps
assay status One-screen operational dashboard
assay start demo|ci|mcp Guided entrypoints for trying, CI setup, or MCP auditing
assay onboard Guided setup: doctor -> scan -> first run plan
assay doctor Preflight check: is Assay ready here?
assay version Print installed version

Instrument + produce evidence

Command Purpose
assay scan Find uninstrumented LLM call sites (--report for HTML)
assay patch Auto-insert SDK integration patches into your entrypoint
assay run Wrap command, collect receipts, build signed evidence pack

Verify + analyze

Command Purpose
assay verify-pack Verify integrity + claims (the 4 exit codes)
assay verify-signer Extract and verify signer identity from a pack manifest
assay explain Plain-English summary of an evidence pack
assay analyze Cost, latency, error breakdown from pack or --history
assay diff Compare packs: claims, cost, latency (--against-previous, --why, --gate-*)
assay score Evidence Readiness Score (0-100, A-F) with anti-gaming caps

Workflows + CI

Command Purpose
assay flow try|adopt|ci|mcp|audit Guided workflow executor (dry-run by default, --apply to execute)
assay ci init github Generate a GitHub Actions workflow
assay ci doctor CI-profile preflight checks
assay audit bundle Create portable audit bundle (tar.gz with verify instructions)
assay compliance report Generate compliance evidence report

Pack + baseline management

Command Purpose
assay packs list List local proof packs
assay packs show Show pack details
assay packs pin-baseline Pin a pack as the diff baseline
assay baseline set|get Set or get the baseline pack for diff

Key management

Command Purpose
assay key generate Generate a new Ed25519 signing key
assay key list List local signing keys and active signer
assay key info Show key details (fingerprint, creation date)
assay key set-active Set active signing key for future runs
assay key rotate Generate a new key and switch active signer
assay key export|import Export or import keys for CI or team sharing
assay key revoke Revoke a signing key

Lockfile + cards

Command Purpose
assay lock write Freeze verification contract to lockfile
assay lock check Validate lockfile against current card definitions
assay lock init Initialize a new lockfile interactively
assay cards list List built-in run cards and their claims
assay cards show Show card details, claims, and parameters

MCP + policy

Command Purpose
assay mcp-proxy Transparent MCP proxy: intercept tool calls, emit receipts
assay mcp policy init Generate a starter MCP policy YAML file
assay mcp policy validate Validate a policy file against the schema
assay policy impact Analyze policy impact on existing evidence

Incident forensics

Command Purpose
assay incident timeline Build incident timeline from receipts
assay incident replay Replay an incident from receipt chain

Demos

Command Purpose
assay demo-incident Two-act scenario: passing run vs failing run
assay demo-challenge CTF-style good + tampered pack pair
assay demo-pack Generate demo packs (no config needed)

Documentation

Common Issues

  • "No receipts emitted" after assay run: First, check whether your code has call sites: assay scan . -- if scan finds 0 sites, you may not be using a supported SDK yet. Installing Assay alone does not emit receipts; your runtime must be instrumented. If scan finds sites, check: (1) Is # assay:patched in the file, or did you add patch() / a callback? Run assay scan . --report to see patch status per file. (2) Did you install the SDK extra (python3 -m pip install "assay-ai[openai]")? (3) Did patch() execute before the first model call? (4) Did you use -- before your command (assay run -- python app.py)? Run assay doctor for a full diagnostic.

  • LangChain projects: assay patch auto-instruments OpenAI and Anthropic SDKs but not LangChain (which uses callbacks, not monkey-patching). For LangChain, add AssayCallbackHandler() to your chain's callbacks parameter manually. See src/assay/integrations/langchain.py for the handler.

  • assay run python app.py gives "No command provided": You need the -- separator: assay run -c receipt_completeness -- python app.py. Everything after -- is passed to the subprocess.

  • Quickstart blocked on large directories: assay quickstart guards against scanning system directories (>10K Python files). Use --force to bypass: assay quickstart --force.

  • macOS: ModuleNotFoundError inside assay run but works outside it: On macOS, python3 on PATH may point to a different Python version than where assay and your SDK are installed (e.g. python3 → 3.14, but packages are in 3.11). Use a virtual environment (recommended), or specify the exact interpreter: assay run -- python3.11 app.py. Check with python3 --version and compare to the Python where you installed Assay.

Get Involved

  • Try it: python3 -m pip install assay-ai && assay quickstart
  • Questions / feedback: GitHub Discussions
  • Bug reports: Issues
  • Want this in your stack in 2 weeks? Pilot program -- we instrument your AI workflows, set up CI gates, and hand you a working evidence pipeline. Open a pilot inquiry.

Related Repos

Repo Purpose
assay Core CLI, SDK, conformance corpus (this repo)
assay-verify-action GitHub Action for CI verification
assay-ledger Public transparency ledger
assay-proof-gallery Live demo packs (PASS / HONEST FAIL / TAMPERED)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assay_ai-1.17.0.tar.gz (359.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assay_ai-1.17.0-py3-none-any.whl (403.2 kB view details)

Uploaded Python 3

File details

Details for the file assay_ai-1.17.0.tar.gz.

File metadata

  • Download URL: assay_ai-1.17.0.tar.gz
  • Upload date:
  • Size: 359.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assay_ai-1.17.0.tar.gz
Algorithm Hash digest
SHA256 84d0b4d13370509ea53c4bb2fe177a93dbf8a29ac38076f15e05d0dd6868f53c
MD5 4848843bae9b14259f235367d485d458
BLAKE2b-256 35bf192a6f1adcf7509db8f235e71d4f8377ed05abed1cc8383acef730386e89

See more details on using hashes here.

Provenance

The following attestation bundles were made for assay_ai-1.17.0.tar.gz:

Publisher: publish.yml on Haserjian/assay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file assay_ai-1.17.0-py3-none-any.whl.

File metadata

  • Download URL: assay_ai-1.17.0-py3-none-any.whl
  • Upload date:
  • Size: 403.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assay_ai-1.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1404d7b58792cd7f2f19c93392eb1361c0069a897be92be433529e3934dbf7d8
MD5 d4c39946a6d8de6fb8d752fe36d20eda
BLAKE2b-256 4c16e59ae3604d7a2f9f9717fa6438be77e7c199c0bf5819a2ddf5442fc135a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for assay_ai-1.17.0-py3-none-any.whl:

Publisher: publish.yml on Haserjian/assay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page