Skip to main content

A content-addressed computation graph with an interactive notebook UI

Project description

Strata

PyPI Python versions License CI Pre-commit Docker Docs codecov OpenSSF Scorecard

Strata is a content-addressed computation graph with an interactive notebook UI.

Every cell output is a versioned artifact keyed by its provenance: source, inputs, and environment. Strata reads each cell's AST to build the dependency graph automatically, so re-running a notebook is mostly a series of cache hits. Touch one cell and the cascade re-executes only the cells that depend on it. Identical inputs produce the same artifact whether the second run comes a minute later or a year later, on the same machine or a different one.

Prompt cells make AI calls first-class DAG nodes, cached by template, inputs, and model config. # @worker gpu-fly dispatches a cell to a remote GPU. # @mount data s3://bucket/prefix ro makes an S3 prefix available as a local pathlib.Path inside the cell. The whole notebook is plain .py files plus a manifest, so commits are git-diffable and there are no JSON blobs or execution metadata bleeding into the history.

Docs: bearing-research.github.io/strata

Highlights

  • content-addressed: every cell output is keyed by source + inputs + environment — identical work hits the cache forever
  • reactive: edit a cell, the cascade re-runs only the downstream cells that depend on it
  • dag-from-ast: Strata reads each cell's AST to wire upstream/downstream — no decorators, no manual edges
  • dag-view: the dependency graph renders alongside the cells — double-click any node to jump to its source
  • git-friendly: notebooks are plain .py files plus a TOML manifest — readable diffs, no JSON blobs
  • prompt cells: LLM calls are first-class DAG nodes, {{ variable }} interpolation from upstream cells, cached by template + inputs + model config
  • SQL cells: named connections, bind-parameter templating, drivers for DuckDB / SQLite / Postgres / Snowflake / BigQuery
  • R cells (0.2.0): Python and R cells share a DAG; cross-language Arrow exchange means a pandas.DataFrame is a data.frame for the next cell. First-class in the UI — Add-R-cell menu, an R environment panel with one-click renv bootstrap + package install, automatic renv::restore() on open, and inline plots (ggplot2 / base graphics render to PNG). Runs headlessly too — strata run executes R cells for CI
  • loop cells: # @loop max_iter=N carry=state iterates a cell with explicit carry between steps — each iteration is its own artifact
  • distributed: # @worker gpu-fly dispatches a single cell to a remote box — bring your own compute
  • mounts: # @mount data s3://bucket/prefix ro makes any S3 / GCS / Azure prefix a local pathlib.Path
  • isolated envs: every notebook gets its own uv-managed .venv/, locked and reproducible
  • auto-install: missing import in a cell? one click adds the package via uv and re-runs
  • headless: strata run ./my-notebook for CI and scheduled execution — same DAG, same cache
  • also a library: the materialization layer is exposed via HTTP + a StrataClient, usable from any Python process
  • production-ready: Iceberg-aware scans, trusted-proxy auth, multi-tenancy, S3 / GCS / Azure / local blob backends

Quick Start

Both paths below run in personal mode: single-user, writes enabled, no proxy auth. For multi-tenant or hosted deployments, see Deployment Modes.

# Docker. docker-compose.yml sets personal mode for you.
docker compose up -d --build
# Then open http://localhost:8765

# Or install via uv (recommended). Fetches the wheel from PyPI into a
# uv-managed tool env at ~/.local/share/uv/tools/strata-notebook with
# the CLI on PATH. Plain `pip install` is not supported — Strata refuses
# to start outside a uv-managed env (see Requirements below).
uv tool install strata-notebook
strata-notebook
# Then open http://localhost:8765

For the full inventory of installed commands (strata-notebook, strata, strata-worker, python -m strata), see the Commands reference.

Source builds — git clone + uv sync — work too and are documented in Installation; needed only if you're modifying Strata itself.

Requirements

  • uv ≥ 0.8 — install via the uv installer (curl -LsSf https://astral.sh/uv/install.sh | sh on macOS/Linux; PowerShell installer on Windows). Strata refuses to start outside a uv-managed environment: the startup check looks for the uv = <version> marker that uv writes to pyvenv.cfg. uv tool install, uv add, and uv run all produce envs with this marker; plain pip install into a hand-rolled python -m venv does not, and Strata will refuse to start there. Conda and pip-venv users need to install uv and re-launch from a uv-managed env — existing data and other environments are untouched. uv fetches a matching Python for you, so you don't need Python pre-installed.

Source build (only if you're building Strata itself from a git clone, not using PyPI or Docker):

  • Rust toolchain (rustup) — for maturin to compile the native extension. PyPI wheels skip this step.
  • Node 24+ / npm — for the frontend npm ci && npm run build step. PyPI wheels bundle the prebuilt SPA.
  • Python 3.12+ is handled automatically by uv sync.

Windows: uv tool install strata-notebook works directly. Source builds work via WSL2 (smoother) or native Windows (uv + rustup + Node have Windows installers).

Why uv at runtime: the notebook subsystem shells out to uv to manage per-notebook .venv/ directories, and the project's dev workflow assumes uv as the install path. Failing fast at startup with a clear message beats a confusing subprocess error later.

The Cache Advantage

Every notebook platform re-executes from scratch when you change one cell. Strata doesn't. The artifact store deduplicates by provenance hash. If the code and inputs haven't changed, the result is served instantly.

First run:     load data (10s) → clean (3s) → train (20s) → evaluate (1s)  = 34s
Change model:  load data (✓)   → clean (✓)  → train (20s) → evaluate (1s)  = 21s
Re-run:        load data (✓)   → clean (✓)  → train (✓)   → evaluate (✓)   = <1s

This isn't a feature bolted on. It's the architecture. Every cell execution is a materialize(inputs, transform, environment) → artifact operation, and the cache is correct by construction because it's keyed on content, not time.

Distributed Execution

Each cell can declare which worker it runs on via a single annotation:

# @worker my-gpu
embeddings = model.encode(abstracts, batch_size=256)

You define workers in notebook.toml. Each one points at an HTTP endpoint that implements the Strata executor protocol. A worker can be a GPU box on RunPod, a DataFusion cluster on Fly, a beefy EC2 instance, or anything else that speaks HTTP. The notebook routes the cell to the declared worker at execution time, and the UI shows a live "dispatching to my-gpu" badge while it runs.

No deployment code, no infrastructure glue. Bring your own compute, one annotation per cell.

Source Annotations

Every piece of per-cell metadata is a comment directive in the cell's source. The source is the single canonical place for cell config: annotations always win over any stored defaults.

# @name Extract embeddings
# @worker gpu-fly
# @timeout 600
# @env MODEL_PATH=/models/bge-large
# @mount dataset s3://corpus/2024-q4 ro
embeddings = model.encode(dataset / "abstracts.jsonl")

Diagnostics fire on open, reload, and after an edit settles: worker_unknown, mount_uri_unsupported, mount_shadows_notebook, timeout_not_numeric, env_malformed. They surface as a pill in the cell header and log structured warnings for headless runs.

Mounts

Mounts bind a remote URI to a local path inside the cell. Supported schemes: file://, s3://, gs://, az://. Credentials flow through fsspec options: set anon = true for public buckets, or drop it to use the standard credential chain.

[[mounts]]
name = "taxi_zones"
uri = "s3://nyc-tlc/misc"
mode = "ro"
options = { anon = true }

Inside the cell, taxi_zones is a pathlib.Path. Strata materializes it on first read and caches the bytes locally for the session.

Examples

Example What it shows
pandas_basics Linear DataFrame chain, caching, staleness propagation
iris_classification End-to-end ML, DAG branching, mixed output types
titanic_ml Feature engineering + model comparison
s3_mount Reading a public S3 bucket via a mount
arxiv_classifier Distributed execution via @worker + Modal GPU + Fly cluster
markdown_showcase Markdown cells, dynamic Markdown(...) outputs, security cases
library_cells Cross-cell library code: pure module cells, mixed runtime+library cells, the limits
news_alpha_trader Multi-stage trading pipeline with prompt cells and structured LLM outputs

Known rough edges

Strata is at 0.1 and a few surfaces are explicitly exploratory. The core (materialization, artifact store, DAG, caching, headless run) is stable in the alpha sense; these are the bits where the API or coverage is still moving:

  • Prompt-cell API. Streaming, conversation memory, and structured-output validation are not yet finalized — expect breaking changes in 0.x.
  • SQL cell cloud drivers. DuckDB / SQLite / PostgreSQL are exercised in CI. BigQuery and Snowflake adapters ship but lack integration test coverage; pin a Strata version in production until that lands. MotherDuck and MySQL are planned but not yet implemented.
  • Wire / on-disk formats. notebook.toml, runtime.json, and the artifact cache layout may change between minor versions during 0.x. Rely on the Python API surface, not the file shapes.

Library usage

Strata's HTTP API exposes the materialization layer directly, driveable from Python via StrataClient. Useful for direct table scans, custom transforms, and headless workflows; the notebook executor is a separate pipeline that writes to the same artifact store. The client talks to a running Strata server, so this workflow has two steps: start the server, then call it from your code.

# 1. Install + start the server (in a uv-managed env).
uv tool install strata-notebook
strata-notebook

# 2. From another process, point the client at it:
from strata import StrataClient

client = StrataClient(base_url="http://localhost:8765")
artifact = client.materialize(
    inputs=["file:///warehouse#db.events"],
    transform={"executor": "scan@v1", "params": {"columns": ["id", "value"]}},
)
table = client.fetch(artifact.uri)  # Arrow table, cached by provenance

The server provides: provenance-based deduplication, immutable versioned artifacts, lineage tracking, Iceberg table scanning with row-group caching, pluggable blob storage (local/S3/GCS/Azure), multi-tenancy, trusted-proxy auth, and an executor protocol for external compute.

Library docs →


Architecture

┌─────────────────────────────────────────────┐
│ Notebook UI (Vue.js + WebSocket)            │
│ cells, DAG view, AI assistant, workers      │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Notebook Backend (FastAPI)                  │
│ session, cascade, executor, prompt cells    │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Strata Core                                 │
│ materialize, artifacts, lineage, dedupe     │
└─────────────────────────────────────────────┘

The notebook is an orchestration layer over Core. It decides what to run next (cascade planning, staleness tracking). The cell harness is an executor. Core decides whether results already exist and persists them.

Development

uv sync                                # Install deps + build Rust extension
uv run pytest                          # Run all tests
uv run pre-commit run --all-files      # Lint + format
cd frontend && npm run dev             # Frontend dev server (hot reload)

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strata_notebook-0.2.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

strata_notebook-0.2.0-cp312-abi3-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.12+Windows x86-64

strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ ARM64

strata_notebook-0.2.0-cp312-abi3-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

strata_notebook-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file strata_notebook-0.2.0.tar.gz.

File metadata

  • Download URL: strata_notebook-0.2.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for strata_notebook-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e7cc53f2ce06ee8bca3f2114cd7e8b6561d2e092e3a8945c81d44872e14206b2
MD5 2c71171e5b45e6a615b264efad28e736
BLAKE2b-256 907fb355df4dc807b54137fc1c1390c1bdfe55edefb99ab4d400f8bb31b2618a

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.2.0.tar.gz:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.2.0-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.2.0-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c9f900f60960756feee927d1b4208f2924c3bf33e09e500864ed78e01b095c94
MD5 472a1046e08fb2608b2504892aeae79e
BLAKE2b-256 7d83950d964e60ed845b33af134b51d120f530b6d74cfe3e92254f113c76dbef

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.2.0-cp312-abi3-win_amd64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 16a7c8904bce613494c935cf92f651660391ff7f42f492cdad2cbb8922118ce3
MD5 889d4d73124e5e2072eb3d3db4187cf0
BLAKE2b-256 9b1039cbc0ee17a79fbe4163aaa9f70b17f55d950d89e9761d23316c6d6d9431

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3c139d5d33274976a27b3054a7702479219915919ab8b989b81da3d6f0da6f94
MD5 da246a35a539ca89b01c3eab69b47121
BLAKE2b-256 aeaf08c3fb35207b5543c66f3f5524adf94e2195d56676b8e2252c063b54b88d

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.2.0-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.2.0-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e5547a3355ecf04c638d189024c41a08997cf03d83f88098553fc9e1b5e57894
MD5 081e4702f5f4c886e41a740ba8ff9897
BLAKE2b-256 d96f6fa416d9e336638c9f366966b144051a8cce5570596b21561c4044b780dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.2.0-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c30d707054053acd160580977ad60d5adae050b00e877378327f5b49555930aa
MD5 0c79cd64f3f5dfa90a7eb07900081fc1
BLAKE2b-256 365aafe6428fdff61598403b1cd5f04bd8bd5e7e698dd7f92554ff76fb374483

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page