Epistemic infrastructure for AI scientists

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

felipeyanez

These details have not been verified by PyPI

Project links

Documentation

Project description

Mareforma

Trust your AI agents' findings without taking them on faith.

Mareforma is the local store where research agents write their claims — signed, cross-referenced, and promoted when independent agents converge — so trust comes from evidence, not the agent's own confidence score.

Why

AI agents are being deployed on real research problems before any infrastructure exists to know which of their findings can be trusted. Tracing tools record what the agent did; they do not record what it means, whether it converges with independent evidence, or how far a conclusion is from its raw data. Without that structure, a silent pipeline failure, a prior-knowledge fallback, and a real result look identical.

Every primitive mareforma uses — Ed25519 signing, DSSE envelopes, Sigstore-Rekor transparency, GRADE evidence vectors, SQLite — already exists in mature form. What is missing in the OSS landscape is the combination: a runtime, opt-in Python library that bundles them as the place an agent writes claims to.

What it does

import mareforma

with mareforma.open() as graph:

    # Query established prior claims. query_for_llm wraps text in
    # <untrusted_data>...</untrusted_data> tags so a downstream LLM
    # consumes it as data, not instructions.
    prior = graph.query_for_llm("topic X", min_support="REPLICATED")

    claim_id = graph.assert_claim(
        "Cell type A exhibits property X under condition Y (n=842, p<0.001)",
        classification="ANALYTICAL",
        generated_by="agent/model-a/lab_a",
        supports=[c["claim_id"] for c in prior],
    )

    # Walk the full lineage of any claim: upstream + downstream + signatures
    # + contradictions + verdicts in one deterministic dict.
    lineage = graph.query_provenance(claim_id, depth=4)

graph LR
    P(["ESTABLISHED upstream<br/>(prior literature)"]) --> A["ANALYTICAL · lab_a"]
    P --> B["ANALYTICAL · lab_b"]
    A --> R(["REPLICATED ✓"])
    B --> R
    R -->|"graph.validate()"| E(["ESTABLISHED ✓"])

    style P fill:#713f12,stroke:#f59e0b,color:#fde68a
    style A fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style B fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style R fill:#14532d,stroke:#22c55e,color:#86efac
    style E fill:#713f12,stroke:#f59e0b,color:#fde68a

REPLICATED fires when two enrolled keys sign claims with different generated_by strings, all citing the same ESTABLISHED upstream in supports[] — three conditions, all required. On a fresh graph, bootstrap an ESTABLISHED anchor with seed=True (enrolled validator only); see Example 03 for the full seed-then-converge pattern.

Trust ladder — derived from graph topology, never self-reported:

Level	Meaning
`PRELIMINARY`	One agent asserted it. Cryptographic provenance, no convergence signal yet.
`REPLICATED`	≥2 enrolled keys signed claims sharing an `ESTABLISHED` upstream with different `generated_by` strings.
`ESTABLISHED`	An enrolled human-typed key signed a validation envelope binding `evidence_seen=[...]` review citations.

Classification — declared by the agent, records what kind of work produced it: INFERRED (LLM reasoning), ANALYTICAL (deterministic analysis against source data), DERIVED (explicitly built on ESTABLISHED / REPLICATED claims). Trust level and classification are independent axes — query both: graph.query(text, min_support="REPLICATED", classification="ANALYTICAL").

Core surface

graph.assert_claim(text, classification, supports=[...], grounding_sensor=verifier)
graph.query(text, min_support="REPLICATED")           # filter by trust + classification
graph.query_for_llm(text, ...)                        # prompt-injection-safe wrapper
graph.query_provenance(claim_id, depth=4)             # full lineage view
graph.validate(claim_id, evidence_seen=[...])         # human promotes to ESTABLISHED
graph.refutation_status(claim_id)                     # clean / contested / contradicted / retracted
graph.find_drifted_dois(limit=100)                    # detect retraction / metadata drift
graph.find_dangling_supports()                        # audit references that point nowhere
graph.get_tools(generated_by="agent/model-a/lab_a")   # framework-ready callables

mareforma bootstrap                  # one-time: generate Ed25519 signing key
mareforma status                     # snapshot health report
mareforma activity --last 100        # rolling op stats (verdict score, drift, ...)
mareforma export <claim_id> --format prov-o   # also in-toto-v1 / ro-crate-1.2 / jsonld
mareforma verify <bundle>            # check signatures + chain hashes

External verification, opt-in by component

DOIs in supports[] / contradicts[] are HEAD-checked against Crossref and DataCite at assertion time. Failed verifications hold the claim out of REPLICATED until refresh_unresolved() succeeds. refresh_all_dois() force-re-checks every DOI and find_drifted_dois() surfaces registry metadata changes (catches retractions).
Ed25519 signing is opt-in via mareforma bootstrap. Every claim then carries a tamper-evident DSSE signature; legacy single-sig and the role-bound claim-with-roles:v1 multi-signature envelopes both verify on restore().
Sigstore-Rekor transparency log is opt-in via mareforma.open(rekor_url=mareforma.signing.PUBLIC_REKOR_URL). Signed claims are submitted; entry uuid + logIndex + raw response bytes persist locally.
RFC 6962 inclusion-proof verification is opt-in via mareforma.open(rekor_log_pubkey_pem=...). The substrate re-fetches each entry and cryptographically verifies the Merkle audit path against the log's signed checkpoint. The key is TOFU-pinned to .mareforma/rekor_log_pubkey.pem — silent rotation is refused.
Grounding sensors are opt-in via assert_claim(grounding_sensor=verifier). Implement mareforma.Verifier; the verdict (score + rationale) is snapshotted into the signed predicate at assertion time. A reference MockNLIVerifier ships with the package.

Storage: local SQLite, WAL mode, ACID guarantees. Network calls only for the opt-in external verifications above.

Silent pipeline failures become visible

The reproduction-worthy use case mareforma was built for. An AI agent runs a multi-step analysis: query a public dataset, regress a gene's expression against a phenotype, return the top hit. The data lookup silently returns null because of a stale identifier. The agent's LLM reasoning fills the gap with prior knowledge and returns a plausible-sounding answer. The output looks identical to a data-driven result.

finding_text = run_pipeline(target_gene, phenotype)

graph.assert_claim(
    finding_text,
    # The one line that breaks the symmetry: classification depends on
    # whether real data flowed through. The substrate doesn't compute
    # this — the agent's wrapper inspects the pipeline state and tells
    # the truth at assertion time.
    classification="ANALYTICAL" if generated_code_ran else "INFERRED",
    generated_by="agent/gpt-4o/lab_a",
    source_name="depmap_24q2" if data_actually_loaded else None,
)

A downstream consumer querying min_support="REPLICATED", classification="ANALYTICAL" excludes the silent-fallback rows. The hallucinated finding stays in the graph (auditable, signed) but is NOT in the trustworthy result set. The wrapper that picks ANALYTICAL vs INFERRED is doing the work — the substrate makes that work visible and tamper-evident.

Example 05 — Drug Target Provenance wraps MEDEA (a real AI research agent published on arXiv), reproduces a real silent-failure mode in its identifier lookup, and shows the classification gate catching it.

Findings contradict — both stay in the graph

prior = graph.query("Treatment X", min_support="ESTABLISHED")

graph.assert_claim(
    "Treatment X shows no effect (n=1240, p=0.21) — larger and more diverse cohort",
    classification="ANALYTICAL",
    contradicts=[c["claim_id"] for c in prior],
)

Science advances by documented contestation, not by one side disappearing. Both claims coexist; a human reviewer sees the tension in the graph. graph.refutation_status(claim_id) surfaces whether a claim is clean, contested, contradicted, or retracted.

Honest scope

Mareforma signs what the asserter claimed. It does not verify that classification, generated_by, or verdict method labels match the actual computation behind them — they are typed strings under cryptographic stapling, not evidence on their own. Trust is local to a project's enrolled validators; there is no federation across installations.

A single attacker with shell access can produce a fully-signature-conforming REPLICATED chain (two keys, two generated_by strings, a shared upstream) and promote it to ESTABLISHED (a second key with validator_type="human") — every signature verifies, every export is spec-conformant, because one process on one machine is not a worldwide replication. Operators worried about this should pin a substrate-external identity anchor (ORCID resolution on validated_by, OIDC-anchored certificates, SCITT-style transparency-service receipts). The substrate makes the structural claims visible and tamper-evident; it does not adjudicate them.

See ARCHITECTURE.md for the full set of design boundaries and SECURITY.md for the threat model.

Related work mareforma does not replace: W3C PROV-O / PROV-AGENT (W3C-recommended provenance vocabulary), FAIRSCAPE's Evidence Graph Ontology (EVI, MIT-licensed), IETF SCITT (signed supply-chain transparency, currently draft-ietf-scitt-architecture-22). Mareforma is a runtime substrate for an agent's working graph, not a publication-grade provenance record.

Get started

uv add mareforma
mareforma bootstrap            # optional: enable signing + transparency

mareforma bootstrap is optional. Without it, claims are stored unsigned. With it, every claim carries a tamper-evident signature and can be published to a Sigstore-Rekor transparency log on demand.

Examples

	Example	What it shows
01	API Walkthrough	Full API reference
02	Compounding Agents	Findings accumulate across agent runs
03	Documented Contestation	Agent challenges established consensus
04	Private Data, Public Findings	Two labs share provenance without sharing data
05	Drug Target Provenance	Real AI research agent with honest evidence labels

AGENTS.md — execution contract, forbidden patterns, signing and transparency log, idempotency convention, generated_by requirements. ARCHITECTURE.md — substrate design (rails not trains), trust ladder topology, full design boundaries. CONTRIBUTING.md — dev workflow. CHANGELOG.md — release notes. SECURITY.md — threat model and disclosure channel.

Full documentation: https://docs.mareforma.com

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

felipeyanez

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.3.1

May 22, 2026

0.3.0

May 15, 2026

0.2.1

May 8, 2026

0.2.0

Apr 8, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mareforma-0.3.1.tar.gz (324.9 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mareforma-0.3.1-py3-none-any.whl (197.3 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file mareforma-0.3.1.tar.gz.

File metadata

Download URL: mareforma-0.3.1.tar.gz
Upload date: May 22, 2026
Size: 324.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mareforma-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`c3de55d6ae9765d41c507694e0c04895aa0833321d3ae79cfe1aef5445a76e93`
MD5	`9ee341676545e2305b68f1f4ac2e0ad7`
BLAKE2b-256	`942197a3b2f90b6bb1356c89690686e2bfb64924120b647ba52fdae8589e299d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mareforma-0.3.1.tar.gz:

Publisher: publish.yml on mareforma/mareforma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mareforma-0.3.1.tar.gz
- Subject digest: c3de55d6ae9765d41c507694e0c04895aa0833321d3ae79cfe1aef5445a76e93
- Sigstore transparency entry: 1601855132
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: mareforma/mareforma@79569cde3bafe69ed584f7e950c093c6983e363b
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/mareforma
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@79569cde3bafe69ed584f7e950c093c6983e363b
- Trigger Event: release

File details

Details for the file mareforma-0.3.1-py3-none-any.whl.

File metadata

Download URL: mareforma-0.3.1-py3-none-any.whl
Upload date: May 22, 2026
Size: 197.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mareforma-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e3ebb803fb4930bf4b1918f7ee27ba73317075acabde914fe124dfcd251a517`
MD5	`e015b48704b095844917a5995e728bbd`
BLAKE2b-256	`477257841cd1dbca1ec7d20ead8a49b91c10bd1a8c7f7846eef016cf6d1fadbc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mareforma-0.3.1-py3-none-any.whl:

Publisher: publish.yml on mareforma/mareforma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mareforma-0.3.1-py3-none-any.whl
- Subject digest: 5e3ebb803fb4930bf4b1918f7ee27ba73317075acabde914fe124dfcd251a517
- Sigstore transparency entry: 1601855138
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: mareforma/mareforma@79569cde3bafe69ed584f7e950c093c6983e363b
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/mareforma
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@79569cde3bafe69ed584f7e950c093c6983e363b
- Trigger Event: release

mareforma 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mareforma

Why

What it does

Core surface

External verification, opt-in by component

Silent pipeline failures become visible

Findings contradict — both stay in the graph

Honest scope

Get started

Examples

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance