Verifiable knowledge graph for scientific experiments
Project description
SciTeX Clew (scitex-clew)
Verifiable knowledge graph for scientific experiments
Full Documentation · pip install scitex-clew
Problem
Scientific publications are growing exponentially — accelerated by LLM-assisted writing — yet peer review remains a manual bottleneck. 70% of researchers report failed replication attempts, and only 11-36% of high-profile findings are successfully reproduced. Existing tools (pre-registration, containerization, workflow managers) address whether research could be reproduced, but not whether it has been.
Solution
SciTeX Clew records every artifact produced during research — code, data, figures, statistics — into a hash-linked DAG (directed acyclic graph). This creates a verifiable knowledge graph of scientific experiments, which can be explored by humans or AI agents.
Named after the thread Ariadne gave Theseus to trace his path through the labyrinth, Clew serves two purposes:
- Reproducibility verification — confirm that outputs remain unchanged and that every step in the pipeline is intact.
- Research logic comprehension — visualize and navigate the structural skeleton of a research project, from raw data through analysis to manuscript claims.
The DAG is a structured, machine-readable representation of an entire research project — enabling both human reviewers and AI agents to inspect, verify, and understand the logic programmatically. It lets you:
- Verify that outputs remain consistent with recorded hashes
- Trace provenance chains from any file back to its source
- Visualize the structural logic of a research project as a navigable graph
- Re-execute scripts in a sandbox to confirm reproducibility
- Link manuscript claims to the computational sessions that produced them
Five Node Classes
Every node in the DAG is classified into one of five semantic roles:
| Class | Role | Examples |
|---|---|---|
| Source | Data acquisition scripts | 01_download.py, collect.sh |
| Input | Raw data and configuration | raw_data.csv, config.yaml |
| Processing | Transform and analysis scripts | 03_analyze.py, train.R |
| Output | Intermediate and final data products | results.csv, figure1.png |
| Claim | Manuscript assertions tied to evidence | "Fig 1 shows p<0.05", "Table 2" |
Table 1. Five node classes. Classification is inferred automatically from file extensions and session roles, or set explicitly via set_node_class().
This classification turns the DAG into a navigable map of the research project. The key operation is backpropagation from claims to sources: starting from a manuscript assertion (claim), Clew traces backward through outputs, processing scripts, and inputs to the original raw data — verifying every hash along the way.
Three Verification Modes
| Mode | Scope | API | Description |
|---|---|---|---|
| Project | Entire pipeline | clew.dag() |
Verifies every session recorded in the database in topological order. A navigation map for ongoing project monitoring. Answers: "Is the whole project intact?" |
| Files | Specific outputs | clew.dag(["output.csv"]) |
Traces backward from target files through their dependency chain and verifies each session. Answers: "Can I trust this specific file?" |
| Claims | Manuscript assertions | clew.verify_claim("Fig 1") |
Verifies individual claims linked to source sessions. Answers: "Is this figure/statistic still backed by the data?" |
Table 2. Three verification modes. Each mode supports both cache verification (millisecond hash comparison) and re-run verification (sandbox re-execution with rerun_dag / rerun_claims).
Installation
Requires Python >= 3.10. Zero dependencies — pure stdlib + sqlite3.
pip install scitex-clew
SciTeX users:
pip install scitexalready includes Clew. Tracking is automatic via@scitex.session+scitex.io.
Quickstart
import scitex_clew as clew
# Git-status-like overview
clew.status()
# Verify a run (hash check)
result = clew.run("session_20250301_143022")
# Trace a file's provenance chain
chain = clew.chain("output/figure.png")
# Verify the full DAG
dag_result = clew.dag(["output/figure.png"])
# Re-execute in sandbox and compare
rerun_result = clew.rerun("session_20250301_143022")
Figure 1. Example DAG visualization. Green nodes indicate verified sessions; red nodes indicate hash mismatches. Clew traces the dependency graph backward from target files to raw data sources.
Four Interfaces
Python API
import scitex_clew as clew
clew.status() # overview
clew.run("session_id") # verify one run
clew.chain("output/figure.png") # trace provenance
clew.dag(["output/figure.png"]) # verify full DAG
clew.rerun("session_id") # sandbox re-execution
clew.mermaid(claims=True) # Mermaid DAG diagram
clew.add_claim("Fig 1 shows p<0.05", source_files=["fig1.png"])
CLI Commands
clew --help-recursive # Show all commands
clew status # Git-status-like overview
clew verify <session_id> # Verify a run
clew list # List tracked runs
clew stats # Database statistics
clew mermaid # Generate Mermaid diagram
clew list-python-apis # List Python API tree
clew mcp list-tools # List MCP tools
MCP Server — for AI Agents
AI agents can verify reproducibility and trace provenance autonomously.
| Tool | Description |
|---|---|
clew_status |
Git-status-like overview |
clew_run |
Verify a specific run |
clew_chain |
Trace file provenance chain |
clew_dag |
Verify full DAG |
clew_list |
List tracked runs |
clew_stats |
Database statistics |
clew_mermaid |
Generate Mermaid DAG diagram |
clew_rerun_dag |
Rerun full DAG in sandbox |
clew_rerun_claims |
Rerun all claim-backing sessions |
Table 3. Nine MCP tools available for AI-assisted verification. All tools accept JSON parameters and return JSON results.
clew mcp start
Skills — for AI Agent Discovery
Skills provide workflow-oriented guides that AI agents query to discover capabilities and usage patterns.
clew skills list # List available skill pages
clew skills get SKILL # Show main skill page
scitex-dev skills export --package scitex-clew # Export to Claude Code
| Skill | Content |
|---|---|
quick-start |
Basic API, session tracking, first verification |
cli-commands |
CLI reference (clew status, clew verify, etc.) |
mcp-tools-for-ai-agents |
MCP tool reference for AI agents |
common-workflows |
Claims, DAG patterns, stamps, reproducibility |
Part of SciTeX
Clew is part of SciTeX. When used inside the SciTeX framework, tracking is automatic:
import scitex
@scitex.session
def main(CONFIG=scitex.INJECTED):
data = scitex.io.load("input.csv") # auto-tracked as input
result = process(data)
scitex.io.save(result, "output.csv") # auto-tracked as output
return 0
All file I/O through scitex.io is recorded in the clew database:
scitex.clew.status() # overview
scitex.clew.run("session_id") # verify
scitex.clew.mermaid(claims=True) # DAG diagram
The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_clew-0.2.5.tar.gz.
File metadata
- Download URL: scitex_clew-0.2.5.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78a5be4b60b95b026028d980a473a3bcbd93dc4ba8fe3f658c43d7ab93ef3223
|
|
| MD5 |
6103268dfe757a9a2531b89d6bc594af
|
|
| BLAKE2b-256 |
f7c51d490929e98458b04286067b89e4c3ca903a84e3ee312fdb7e121c5133db
|
Provenance
The following attestation bundles were made for scitex_clew-0.2.5.tar.gz:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-clew
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_clew-0.2.5.tar.gz -
Subject digest:
78a5be4b60b95b026028d980a473a3bcbd93dc4ba8fe3f658c43d7ab93ef3223 - Sigstore transparency entry: 1186460551
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-clew@a760c3b45840261626a2bc3c91529f6c4ae0aca9 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a760c3b45840261626a2bc3c91529f6c4ae0aca9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file scitex_clew-0.2.5-py3-none-any.whl.
File metadata
- Download URL: scitex_clew-0.2.5-py3-none-any.whl
- Upload date:
- Size: 192.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67f3b0493d1305b36a94e2810877cc1d3a89a39543afa763fa3edc6fbcdbced9
|
|
| MD5 |
e88bca0701ff9df272a4bdf3c5b7b557
|
|
| BLAKE2b-256 |
eb58aff9397ac01f4ab86b7932ffbdf1fd2cb0dce9f9118fcec9d7fb3ace1b70
|
Provenance
The following attestation bundles were made for scitex_clew-0.2.5-py3-none-any.whl:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-clew
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_clew-0.2.5-py3-none-any.whl -
Subject digest:
67f3b0493d1305b36a94e2810877cc1d3a89a39543afa763fa3edc6fbcdbced9 - Sigstore transparency entry: 1186460558
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-clew@a760c3b45840261626a2bc3c91529f6c4ae0aca9 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a760c3b45840261626a2bc3c91529f6c4ae0aca9 -
Trigger Event:
push
-
Statement type: