Epistemic infrastructure for AI scientists
Project description
Mareforma
Trust your AI agents' findings without taking them on faith.
Mareforma is the local store where research agents write their claims — signed, cross-referenced, and promoted when independent agents converge — so trust comes from evidence, not the agent's own confidence score.
Why
AI agents are being deployed on real research problems before any infrastructure exists to know which of their findings can be trusted. Tracing tools record what the agent did; they do not record what it means, whether it converges with independent evidence, or how far a conclusion is from its raw data. Without that structure, a silent pipeline failure, a prior-knowledge fallback, and a real result look identical.
Every primitive mareforma uses — Ed25519 signing, DSSE envelopes, Sigstore-Rekor transparency, GRADE evidence vectors, SQLite — already exists in mature form. What is missing in the OSS landscape is the combination: a runtime, opt-in Python library that bundles them as the place an agent writes claims to.
What it does
import mareforma
with mareforma.open() as graph:
# Query established prior claims. query_for_llm wraps text in
# <untrusted_data>...</untrusted_data> tags so a downstream LLM
# consumes it as data, not instructions.
prior = graph.query_for_llm("topic X", min_support="REPLICATED")
claim_id = graph.assert_claim(
"Cell type A exhibits property X under condition Y (n=842, p<0.001)",
classification="ANALYTICAL",
generated_by="agent/model-a/lab_a",
supports=[c["claim_id"] for c in prior],
)
# Walk the full lineage of any claim: upstream + downstream + signatures
# + contradictions + verdicts in one deterministic dict.
lineage = graph.query_provenance(claim_id, depth=4)
graph LR
P(["ESTABLISHED upstream<br/>(prior literature)"]) --> A["ANALYTICAL · lab_a"]
P --> B["ANALYTICAL · lab_b"]
A --> R(["REPLICATED ✓"])
B --> R
R -->|"graph.validate()"| E(["ESTABLISHED ✓"])
style P fill:#713f12,stroke:#f59e0b,color:#fde68a
style A fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style B fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style R fill:#14532d,stroke:#22c55e,color:#86efac
style E fill:#713f12,stroke:#f59e0b,color:#fde68a
REPLICATED fires when two enrolled keys sign claims with different
generated_by strings, all citing the same ESTABLISHED upstream in
supports[] — three conditions, all required. On a fresh graph,
bootstrap an ESTABLISHED anchor with seed=True (enrolled validator
only); see Example 03 for the
full seed-then-converge pattern.
Trust ladder — derived from graph topology, never self-reported:
| Level | Meaning |
|---|---|
PRELIMINARY |
One agent asserted it. Cryptographic provenance, no convergence signal yet. |
REPLICATED |
≥2 enrolled keys signed claims sharing an ESTABLISHED upstream with different generated_by strings. |
ESTABLISHED |
An enrolled human-typed key signed a validation envelope binding evidence_seen=[...] review citations. |
Classification — declared by the agent, records what kind of work produced it: INFERRED (LLM reasoning), ANALYTICAL (deterministic analysis against source data), DERIVED (explicitly built on ESTABLISHED / REPLICATED claims). Trust level and classification are independent axes — query both: graph.query(text, min_support="REPLICATED", classification="ANALYTICAL").
Core surface
graph.assert_claim(text, classification, supports=[...], grounding_sensor=verifier)
graph.query(text, min_support="REPLICATED") # filter by trust + classification
graph.query_for_llm(text, ...) # prompt-injection-safe wrapper
graph.query_provenance(claim_id, depth=4) # full lineage view
graph.validate(claim_id, evidence_seen=[...]) # human promotes to ESTABLISHED
graph.refutation_status(claim_id) # clean / contested / contradicted / retracted
graph.find_drifted_dois(limit=100) # detect retraction / metadata drift
graph.find_dangling_supports() # audit references that point nowhere
graph.get_tools(generated_by="agent/model-a/lab_a") # framework-ready callables
mareforma bootstrap # one-time: generate Ed25519 signing key
mareforma status # snapshot health report
mareforma activity --last 100 # rolling op stats (verdict score, drift, ...)
mareforma export <claim_id> --format prov-o # also in-toto-v1 / ro-crate-1.2 / jsonld
mareforma verify <bundle> # check signatures + chain hashes
External verification, opt-in by component
- DOIs in
supports[]/contradicts[]are HEAD-checked against Crossref and DataCite at assertion time. Failed verifications hold the claim out ofREPLICATEDuntilrefresh_unresolved()succeeds.refresh_all_dois()force-re-checks every DOI andfind_drifted_dois()surfaces registry metadata changes (catches retractions). - Ed25519 signing is opt-in via
mareforma bootstrap. Every claim then carries a tamper-evident DSSE signature; legacy single-sig and the role-boundclaim-with-roles:v1multi-signature envelopes both verify onrestore(). - Sigstore-Rekor transparency log is opt-in via
mareforma.open(rekor_url=mareforma.signing.PUBLIC_REKOR_URL). Signed claims are submitted; entry uuid + logIndex + raw response bytes persist locally. - RFC 6962 inclusion-proof verification is opt-in via
mareforma.open(rekor_log_pubkey_pem=...). The substrate re-fetches each entry and cryptographically verifies the Merkle audit path against the log's signed checkpoint. The key is TOFU-pinned to.mareforma/rekor_log_pubkey.pem— silent rotation is refused. - Grounding sensors are opt-in via
assert_claim(grounding_sensor=verifier). Implementmareforma.Verifier; the verdict (score + rationale) is snapshotted into the signed predicate at assertion time. A referenceMockNLIVerifierships with the package.
Storage: local SQLite, WAL mode, ACID guarantees. Network calls only for the opt-in external verifications above.
Silent pipeline failures become visible
The reproduction-worthy use case mareforma was built for. An AI agent runs a multi-step analysis: query a public dataset, regress a gene's expression against a phenotype, return the top hit. The data lookup silently returns null because of a stale identifier. The agent's LLM reasoning fills the gap with prior knowledge and returns a plausible-sounding answer. The output looks identical to a data-driven result.
finding_text = run_pipeline(target_gene, phenotype)
graph.assert_claim(
finding_text,
# The one line that breaks the symmetry: classification depends on
# whether real data flowed through. The substrate doesn't compute
# this — the agent's wrapper inspects the pipeline state and tells
# the truth at assertion time.
classification="ANALYTICAL" if generated_code_ran else "INFERRED",
generated_by="agent/gpt-4o/lab_a",
source_name="depmap_24q2" if data_actually_loaded else None,
)
A downstream consumer querying min_support="REPLICATED", classification="ANALYTICAL" excludes the silent-fallback rows. The hallucinated finding stays in the graph (auditable, signed) but is NOT in the trustworthy result set. The wrapper that picks ANALYTICAL vs INFERRED is doing the work — the substrate makes that work visible and tamper-evident.
Example 05 — Drug Target Provenance wraps MEDEA (a real AI research agent published on arXiv), reproduces a real silent-failure mode in its identifier lookup, and shows the classification gate catching it.
Findings contradict — both stay in the graph
prior = graph.query("Treatment X", min_support="ESTABLISHED")
graph.assert_claim(
"Treatment X shows no effect (n=1240, p=0.21) — larger and more diverse cohort",
classification="ANALYTICAL",
contradicts=[c["claim_id"] for c in prior],
)
Science advances by documented contestation, not by one side disappearing. Both claims coexist; a human reviewer sees the tension in the graph. graph.refutation_status(claim_id) surfaces whether a claim is clean, contested, contradicted, or retracted.
Honest scope
Mareforma signs what the asserter claimed. It does not verify that classification, generated_by, or verdict method labels match the actual computation behind them — they are typed strings under cryptographic stapling, not evidence on their own. Trust is local to a project's enrolled validators; there is no federation across installations.
A single attacker with shell access can produce a fully-signature-conforming REPLICATED chain (two keys, two generated_by strings, a shared upstream) and promote it to ESTABLISHED (a second key with validator_type="human") — every signature verifies, every export is spec-conformant, because one process on one machine is not a worldwide replication. Operators worried about this should pin a substrate-external identity anchor (ORCID resolution on validated_by, OIDC-anchored certificates, SCITT-style transparency-service receipts). The substrate makes the structural claims visible and tamper-evident; it does not adjudicate them.
See ARCHITECTURE.md for the full set of design boundaries and SECURITY.md for the threat model.
Related work mareforma does not replace: W3C PROV-O / PROV-AGENT (W3C-recommended provenance vocabulary), FAIRSCAPE's Evidence Graph Ontology (EVI, MIT-licensed), IETF SCITT (signed supply-chain transparency, currently draft-ietf-scitt-architecture-22). Mareforma is a runtime substrate for an agent's working graph, not a publication-grade provenance record.
Get started
uv add mareforma
mareforma bootstrap # optional: enable signing + transparency
mareforma bootstrap is optional. Without it, claims are stored unsigned. With it, every claim carries a tamper-evident signature and can be published to a Sigstore-Rekor transparency log on demand.
Examples
| Example | What it shows | |
|---|---|---|
| 01 | API Walkthrough | Full API reference |
| 02 | Compounding Agents | Findings accumulate across agent runs |
| 03 | Documented Contestation | Agent challenges established consensus |
| 04 | Private Data, Public Findings | Two labs share provenance without sharing data |
| 05 | Drug Target Provenance | Real AI research agent with honest evidence labels |
AGENTS.md — execution contract, forbidden patterns, signing and transparency log, idempotency convention, generated_by requirements.
ARCHITECTURE.md — substrate design (rails not trains), trust ladder topology, full design boundaries.
CONTRIBUTING.md — dev workflow.
CHANGELOG.md — release notes.
SECURITY.md — threat model and disclosure channel.
Full documentation: https://docs.mareforma.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mareforma-0.3.1.tar.gz.
File metadata
- Download URL: mareforma-0.3.1.tar.gz
- Upload date:
- Size: 324.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3de55d6ae9765d41c507694e0c04895aa0833321d3ae79cfe1aef5445a76e93
|
|
| MD5 |
9ee341676545e2305b68f1f4ac2e0ad7
|
|
| BLAKE2b-256 |
942197a3b2f90b6bb1356c89690686e2bfb64924120b647ba52fdae8589e299d
|
Provenance
The following attestation bundles were made for mareforma-0.3.1.tar.gz:
Publisher:
publish.yml on mareforma/mareforma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mareforma-0.3.1.tar.gz -
Subject digest:
c3de55d6ae9765d41c507694e0c04895aa0833321d3ae79cfe1aef5445a76e93 - Sigstore transparency entry: 1601855132
- Sigstore integration time:
-
Permalink:
mareforma/mareforma@79569cde3bafe69ed584f7e950c093c6983e363b -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/mareforma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@79569cde3bafe69ed584f7e950c093c6983e363b -
Trigger Event:
release
-
Statement type:
File details
Details for the file mareforma-0.3.1-py3-none-any.whl.
File metadata
- Download URL: mareforma-0.3.1-py3-none-any.whl
- Upload date:
- Size: 197.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e3ebb803fb4930bf4b1918f7ee27ba73317075acabde914fe124dfcd251a517
|
|
| MD5 |
e015b48704b095844917a5995e728bbd
|
|
| BLAKE2b-256 |
477257841cd1dbca1ec7d20ead8a49b91c10bd1a8c7f7846eef016cf6d1fadbc
|
Provenance
The following attestation bundles were made for mareforma-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on mareforma/mareforma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mareforma-0.3.1-py3-none-any.whl -
Subject digest:
5e3ebb803fb4930bf4b1918f7ee27ba73317075acabde914fe124dfcd251a517 - Sigstore transparency entry: 1601855138
- Sigstore integration time:
-
Permalink:
mareforma/mareforma@79569cde3bafe69ed584f7e950c093c6983e363b -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/mareforma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@79569cde3bafe69ed584f7e950c093c6983e363b -
Trigger Event:
release
-
Statement type: