A content-addressed computation graph with an interactive notebook UI
Project description
Strata
Strata is a content-addressed computation graph with an interactive notebook UI.
Every cell output is a versioned artifact keyed by its provenance: source, inputs, and environment. Strata reads each cell's AST to build the dependency graph automatically, so re-running a notebook is mostly a series of cache hits. Touch one cell and the cascade re-executes only the cells that depend on it. Identical inputs produce the same artifact whether the second run comes a minute later or a year later, on the same machine or a different one.
Prompt cells make AI calls first-class DAG nodes, cached by template,
inputs, and model config. # @worker gpu-fly dispatches a cell to a remote
GPU. # @mount data s3://bucket/prefix ro makes an S3 prefix available as a
local pathlib.Path inside the cell. The whole notebook is plain .py
files plus a manifest, so commits are git-diffable and there are no JSON
blobs or execution metadata bleeding into the history.
Docs: bearing-research.github.io/strata
Quick Start
Both paths below run in personal mode: single-user, writes enabled, no proxy auth. For multi-tenant or hosted deployments, see Deployment Modes.
# Docker (recommended). docker-compose.yml sets personal mode for you.
docker compose up -d --build
# Then open http://localhost:8765
# Or from source — requires uv (see Requirements below).
uv sync
cd frontend && npm ci && npm run build && cd ..
uv run strata-server
# Then open http://localhost:8765
#
# For the full inventory of installed commands
# (strata-server, strata, strata-worker, python -m strata),
# see docs/getting-started/installation.md#commands-reference.
Requirements
Runtime (Docker or uv run strata-server):
- uv ≥ 0.8 — install via the
uv installer
(
curl -LsSf https://astral.sh/uv/install.sh | shon macOS/Linux; PowerShell installer on Windows). Strata refuses to start outside a uv-managed environment: the startup check looks for theuv = <version>marker that uv writes topyvenv.cfg;uv runanduvxproduce envs with this marker, hand-rolledpython -m venvvenvs do not. Conda and pip-venv users need to install uv and re-launch Strata from a uv-managed env — existing data and other environments are untouched, but Strata's own runtime has to be uv-managed. uv fetches a matching Python for you, so you don't need Python pre-installed.
Source build (this is currently the only install path —
strata-notebook isn't on PyPI yet; PyPI wheels are planned for
0.1.0 and will let you skip the Rust step):
- Rust toolchain (rustup) — for
maturinto compile the native extension. Once installed,cargoandrustcneed to be onPATHsouv synccan invoke them. - Node 25+ / npm — for the frontend
npm ci && npm run buildstep. - Python 3.12+ is handled automatically by
uv sync.
Windows: source builds work via WSL2 (smoother) or native Windows (uv + rustup + Node have Windows installers). Day-to-day dev is on macOS/Linux, so WSL2 is the better-trodden path.
Why uv at runtime: the notebook subsystem shells out to uv to
manage per-notebook .venv/ directories, and the project's dev
workflow assumes uv as the install path. Failing fast at startup with
a clear message beats a confusing subprocess error later.
Notebook Features
- Content-addressed caching. Same code plus same inputs equals an instant cache hit, zero recomputation.
- Automatic dependency tracking. DAG built from variable analysis, no manual wiring.
- Cascade execution. Change upstream code, downstream cells auto-invalidate.
- Distributed workers. Annotate
@worker gpu-flyand the cell dispatches to a remote GPU. - Prompt cells. LLM-powered cells with
{{ variable }}template injection. - SQL cells. First-class SQL cells with
# @sql connection=<name>, named-bind parameters, and DuckDB / SQLite / PostgreSQL / Snowflake / BigQuery drivers. - AI assistant. Streaming chat with conversation memory, agent mode for autonomous notebook building.
- Environment management. Per-notebook Python venvs via uv, isolated from each other.
- Rich outputs. DataFrames, matplotlib plots, markdown, images.
- Cell operations. Reorder, duplicate, fold, keyboard shortcuts.
- Headless runner.
strata run ./my-notebookfor CI and scheduled execution.
The Cache Advantage
Every notebook platform re-executes from scratch when you change one cell. Strata doesn't. The artifact store deduplicates by provenance hash. If the code and inputs haven't changed, the result is served instantly.
First run: load data (10s) → clean (3s) → train (20s) → evaluate (1s) = 34s
Change model: load data (✓) → clean (✓) → train (20s) → evaluate (1s) = 21s
Re-run: load data (✓) → clean (✓) → train (✓) → evaluate (✓) = <1s
This isn't a feature bolted on. It's the architecture. Every cell
execution is a materialize(inputs, transform, environment) → artifact operation,
and the cache is correct by construction because it's keyed on content,
not time.
Distributed Execution
Each cell can declare which worker it runs on via a single annotation:
# @worker my-gpu
embeddings = model.encode(abstracts, batch_size=256)
You define workers in notebook.toml. Each one points at an HTTP
endpoint that implements the Strata executor protocol. A worker can be
a GPU box on RunPod, a DataFusion cluster on Fly, a beefy EC2 instance,
or anything else that speaks HTTP. The notebook routes the cell to the
declared worker at execution time, and the UI shows a live
"dispatching to my-gpu" badge while it runs.
No deployment code, no infrastructure glue. Bring your own compute, one annotation per cell.
Source Annotations
Every piece of per-cell metadata is a comment directive in the cell's source. The source is the single canonical place for cell config: annotations always win over any stored defaults.
# @name Extract embeddings
# @worker gpu-fly
# @timeout 600
# @env MODEL_PATH=/models/bge-large
# @mount dataset s3://corpus/2024-q4 ro
embeddings = model.encode(dataset / "abstracts.jsonl")
Diagnostics fire on open, reload, and after an edit settles:
worker_unknown, mount_uri_unsupported, mount_shadows_notebook,
timeout_not_numeric, env_malformed. They surface as a pill in the
cell header and log structured warnings for headless runs.
Mounts
Mounts bind a remote URI to a local path inside the cell. Supported
schemes: file://, s3://, gs://, az://. Credentials flow through
fsspec options: set anon = true for public buckets, or drop it to
use the standard credential chain.
[[mounts]]
name = "taxi_zones"
uri = "s3://nyc-tlc/misc"
mode = "ro"
options = { anon = true }
Inside the cell, taxi_zones is a pathlib.Path. Strata materializes
it on first read and caches the bytes locally for the session.
Examples
| Example | What it shows |
|---|---|
| pandas_basics | Linear DataFrame chain, caching, staleness propagation |
| iris_classification | End-to-end ML, DAG branching, mixed output types |
| titanic_ml | Feature engineering + model comparison |
| s3_mount | Reading a public S3 bucket via a mount |
| arxiv_classifier | Distributed execution via @worker + Modal GPU + Fly cluster |
| markdown_showcase | Markdown cells, dynamic Markdown(...) outputs, security cases |
| library_cells | Cross-cell library code: pure module cells, mixed runtime+library cells, the limits |
| news_alpha_trader | Multi-stage trading pipeline with prompt cells and structured LLM outputs |
Known rough edges
Strata is at 0.1 and a few surfaces are explicitly exploratory. The core (materialization, artifact store, DAG, caching, headless run) is stable in the alpha sense; these are the bits where the API or coverage is still moving:
- Prompt-cell API. Streaming, conversation memory, and structured-output validation are not yet finalized — expect breaking changes in 0.x.
- SQL cell cloud drivers. DuckDB / SQLite / PostgreSQL are exercised in CI. BigQuery and Snowflake adapters ship but lack integration test coverage; pin a Strata version in production until that lands. MotherDuck and MySQL are planned but not yet implemented.
- Wire / on-disk formats.
notebook.toml,runtime.json, and the artifact cache layout may change between minor versions during 0.x. Rely on the Python API surface, not the file shapes.
Library usage
Strata's HTTP API exposes the materialization layer directly,
driveable from Python via StrataClient. Useful for direct table
scans, custom transforms, and headless workflows; the notebook
executor is a separate pipeline that writes to the same artifact
store. The client talks to a running Strata server, so this workflow
has two steps: start the server, then call it from your code.
# 1. Install + start the server (in a uv-managed env).
# Until 0.1.0 ships to PyPI, install from a git checkout — needs
# the Rust toolchain (see Requirements above).
git clone https://github.com/bearing-research/strata.git
cd strata
uv sync --all-extras
uv run strata-server
# 2. From another process, point the client at it:
from strata import StrataClient
client = StrataClient(base_url="http://localhost:8765")
artifact = client.materialize(
inputs=["file:///warehouse#db.events"],
transform={"executor": "scan@v1", "params": {"columns": ["id", "value"]}},
)
table = client.fetch(artifact.uri) # Arrow table, cached by provenance
The server provides: provenance-based deduplication, immutable versioned artifacts, lineage tracking, Iceberg table scanning with row-group caching, pluggable blob storage (local/S3/GCS/Azure), multi-tenancy, trusted-proxy auth, and an executor protocol for external compute.
Architecture
┌─────────────────────────────────────────────┐
│ Notebook UI (Vue.js + WebSocket) │
│ cells, DAG view, AI assistant, workers │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Notebook Backend (FastAPI) │
│ session, cascade, executor, prompt cells │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Strata Core │
│ materialize, artifacts, lineage, dedupe │
└─────────────────────────────────────────────┘
The notebook is an orchestration layer over Core. It decides what to run next (cascade planning, staleness tracking). The cell harness is an executor. Core decides whether results already exist and persists them.
Development
uv sync # Install deps + build Rust extension
uv run pytest # Run all tests
uv run pre-commit run --all-files # Lint + format
cd frontend && npm run dev # Frontend dev server (hot reload)
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strata_notebook-0.1.0a2.tar.gz.
File metadata
- Download URL: strata_notebook-0.1.0a2.tar.gz
- Upload date:
- Size: 951.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20b9c03d146b84224e7cba459723f1dc8ae9a131a839d985a1162f5b1ad3aad5
|
|
| MD5 |
7ec915bc1978eaa64cab8a81ab4319ef
|
|
| BLAKE2b-256 |
e46327be88794c107f8e7c358b926628fe61f5f76fcd7eaf5fa0b83c77943d1a
|
Provenance
The following attestation bundles were made for strata_notebook-0.1.0a2.tar.gz:
Publisher:
release.yml on bearing-research/strata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strata_notebook-0.1.0a2.tar.gz -
Subject digest:
20b9c03d146b84224e7cba459723f1dc8ae9a131a839d985a1162f5b1ad3aad5 - Sigstore transparency entry: 1577090154
- Sigstore integration time:
-
Permalink:
bearing-research/strata@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/bearing-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl.
File metadata
- Download URL: strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.12+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13439e2c2db1fa9d64f46a679736dc14938749ac1f9ead8a11999409d814ba54
|
|
| MD5 |
dcb981ff3e20ae8e264235562c7df9e6
|
|
| BLAKE2b-256 |
f2f02ecea74d03992cfd7f456099bc24007772b2777f259f59c8ba03d538db94
|
Provenance
The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl:
Publisher:
release.yml on bearing-research/strata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl -
Subject digest:
13439e2c2db1fa9d64f46a679736dc14938749ac1f9ead8a11999409d814ba54 - Sigstore transparency entry: 1577091516
- Sigstore integration time:
-
Permalink:
bearing-research/strata@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/bearing-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.12+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5430261774cecc4df9b036e3bf84c486e413e38752678ccdade5b79da37ce64a
|
|
| MD5 |
22d401029b26dc13e83de24deb6e2af8
|
|
| BLAKE2b-256 |
87510a112b99303bf46ff7be9e3ed4a62450db7e0065a23dfddd6f4deb889b7e
|
Provenance
The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on bearing-research/strata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
5430261774cecc4df9b036e3bf84c486e413e38752678ccdade5b79da37ce64a - Sigstore transparency entry: 1577090286
- Sigstore integration time:
-
Permalink:
bearing-research/strata@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/bearing-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.12+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a2af46c8caa1dd6426cfd976e9ee96f02d61eca3b213d56790a512cc42bb810
|
|
| MD5 |
3e523d5c68a5d2edcd66fccd038635cc
|
|
| BLAKE2b-256 |
91caa02adcf6524e148bdb6bca5b275c80701b9868eca927ef6f11e752b8071f
|
Provenance
The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on bearing-research/strata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
2a2af46c8caa1dd6426cfd976e9ee96f02d61eca3b213d56790a512cc42bb810 - Sigstore transparency entry: 1577091036
- Sigstore integration time:
-
Permalink:
bearing-research/strata@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/bearing-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.12+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f71e6274c9501d9e4582aa157fc7a74baf5b2f595212a96557702a51425916cb
|
|
| MD5 |
06137e86ee468a1fac4ecbea9e580350
|
|
| BLAKE2b-256 |
120eab242998218bd870bd57abc5e71a7d360f3571e39e4f7c731d748219b5ef
|
Provenance
The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on bearing-research/strata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl -
Subject digest:
f71e6274c9501d9e4582aa157fc7a74baf5b2f595212a96557702a51425916cb - Sigstore transparency entry: 1577091672
- Sigstore integration time:
-
Permalink:
bearing-research/strata@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/bearing-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.12+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
771e2a784204b575371ff723716292c7d64d0c93490fbf6288d1692e9f555a78
|
|
| MD5 |
f2f21f4a477632e9c8892a76eb8f7682
|
|
| BLAKE2b-256 |
13311e153e9699849fc4f54adc5b527c8ae14e889b5434fa5caf6e938748d378
|
Provenance
The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on bearing-research/strata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl -
Subject digest:
771e2a784204b575371ff723716292c7d64d0c93490fbf6288d1692e9f555a78 - Sigstore transparency entry: 1577090932
- Sigstore integration time:
-
Permalink:
bearing-research/strata@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/bearing-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@18d9423d9d890c360f6cc4a8406529f0af6145b6 -
Trigger Event:
push
-
Statement type: