Automatic provenance for AI-driven research pipelines
Project description
mareforma
Automatic epistemic provenance for life sciences pipelines. Write transforms, run build, and mareforma figures out what kind of result you produced and how well-supported it is — no manual annotation required.
Install
pip install mareforma
Requires Python ≥ 3.10.
How it works
Write normal Python pipeline functions. mareforma auto-classifies each result.
from mareforma import transform, BuildContext
import pandas as pd
@transform("morphology.load")
def load(ctx: BuildContext) -> None:
files = list(ctx.source_path("morphology").glob("*.swc"))
ctx.save("skeletons", files, fmt="pickle")
@transform("morphology.features", depends_on=["morphology.load"])
def compute_features(ctx: BuildContext) -> None:
skeletons = ctx.load("morphology.load.skeletons")
df = pd.DataFrame([_extract_features(s) for s in skeletons])
ctx.save("features", df, fmt="csv")
mareforma build
# ✓ morphology.load done (1.2s)
# ✓ morphology.features done (3.8s)
mareforma trace morphology.features
# morphology
# └── morphology.load ──────── RAW ── SINGLE
# └── morphology.features ANALYSED ── REPLICATED ◇
That's it. No annotations. mareforma reads your artifacts, classifies each transform, and tracks support level automatically.
What gets classified automatically
Transform class — inferred from artifact content:
| Class | Meaning |
|---|---|
RAW |
Root node — no upstream dependencies |
PROCESSED |
Output values ⊆ input values, row count ≤ input count |
ANALYSED |
New values computed, within input value range |
INFERRED |
Output values outside all input ranges |
Support level — inferred from run history:
| Level | Meaning |
|---|---|
SINGLE |
One run |
REPLICATED ◇ |
Same output hash across ≥2 runs |
CONVERGED ● |
Same step name across ≥2 independent sources |
CONSISTENT ◆ |
A run has a DOI-linked claim in supports |
ESTABLISHED ●● |
CONVERGED + CONSISTENT |
SINGLE through CONVERGED require no annotation. CONSISTENT and ESTABLISHED require one DOI string in a claim.
Quickstart
# 1. Init
cd my_project/
mareforma init
# 2. Register a data source
mareforma add-source morphology --path data/morphology/raw/ \
--description "Neuron skeleton reconstructions"
# 3. Build — classification is automatic
mareforma build
# 4. Inspect the epistemic graph
mareforma trace morphology.features
# 5. Check overall health
mareforma status
# 6. Optional: link a result to literature (unlocks CONSISTENT)
mareforma claim add "Neuron size increases with cortical depth" \
--source morphology --supports 10.64898/2026.03.05.709819
# 7. Export provenance graph
mareforma export
BuildContext API
| Method | Description |
|---|---|
ctx.source_path("name") |
Raw data path for a registered source |
ctx.save("name", data, fmt=...) |
Persist artifact (pickle, parquet, csv, numpy) |
ctx.load("transform.artifact") |
Load upstream artifact |
ctx.claim("text", supports=[DOI]) |
Optional: link this run to literature |
ctx.log("message") |
Write to console |
CLI reference
| Command | Description |
|---|---|
mareforma init |
Initialise project |
mareforma add-source <name> |
Register a data source |
mareforma check |
Validate paths and required fields |
mareforma build [source] |
Run the pipeline DAG (--dry-run, --force) |
mareforma trace <transform> |
Ancestry tree with class and support level (--json) |
mareforma status |
Epistemic health dashboard (--json) |
mareforma diff <transform> |
Compare the two most recent runs (--json) |
mareforma log |
Last build status (--json) |
mareforma explain [source] |
Dump project ontology (--json) |
mareforma export |
Write ontology.jsonld |
mareforma claim add TEXT |
Link a result to literature (--supports DOI) |
mareforma claim list |
List claims (--status, --source, --json) |
mareforma claim show ID |
Full claim detail |
mareforma claim update ID |
Update confidence, status, or supports |
Project structure
my_project/
├── .mareforma/
│ └── graph.db ← provenance graph (commit this)
├── mareforma.project.toml ← project ontology (commit this)
├── claims.toml ← claims backup, auto-generated (commit this)
├── ontology.jsonld ← JSON-LD export (commit this)
└── data/
└── source_name/
├── raw/ ← your data
└── preprocessing/
└── build_transform.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mareforma-0.1.0.tar.gz.
File metadata
- Download URL: mareforma-0.1.0.tar.gz
- Upload date:
- Size: 79.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0c0c52b1087e53c1e23a0a229519e32ef91c505c3c44145204d93083f7dc4e8
|
|
| MD5 |
62d53d3a397ec356c3c50eec7842e04b
|
|
| BLAKE2b-256 |
1b181d51152f71ddeb92acde72148019762c029fdac39bb9086a8e0fbb95f933
|
Provenance
The following attestation bundles were made for mareforma-0.1.0.tar.gz:
Publisher:
publish.yml on mareforma/mareforma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mareforma-0.1.0.tar.gz -
Subject digest:
f0c0c52b1087e53c1e23a0a229519e32ef91c505c3c44145204d93083f7dc4e8 - Sigstore transparency entry: 1181486010
- Sigstore integration time:
-
Permalink:
mareforma/mareforma@50e723f77e79690d59bc0390a3ca387030687a5f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mareforma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@50e723f77e79690d59bc0390a3ca387030687a5f -
Trigger Event:
release
-
Statement type:
File details
Details for the file mareforma-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mareforma-0.1.0-py3-none-any.whl
- Upload date:
- Size: 65.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f8bda48cd43090007d051abc7f110e7f2d68edf3b59385964c8e262b22b7724
|
|
| MD5 |
68b55e3d124c259aa8dbf986574b7b4e
|
|
| BLAKE2b-256 |
9383efb18d80324280ed54b0fd60941dbac68312fe5469d01086975e1a1c6790
|
Provenance
The following attestation bundles were made for mareforma-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mareforma/mareforma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mareforma-0.1.0-py3-none-any.whl -
Subject digest:
2f8bda48cd43090007d051abc7f110e7f2d68edf3b59385964c8e262b22b7724 - Sigstore transparency entry: 1181486053
- Sigstore integration time:
-
Permalink:
mareforma/mareforma@50e723f77e79690d59bc0390a3ca387030687a5f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mareforma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@50e723f77e79690d59bc0390a3ca387030687a5f -
Trigger Event:
release
-
Statement type: