Skip to main content

A content-addressed computation graph with an interactive notebook UI

Project description

Strata

PyPI Python versions License CI Pre-commit Docker Docs codecov OpenSSF Scorecard

Strata is a content-addressed computation graph with an interactive notebook UI.

Every cell output is a versioned artifact keyed by its provenance: source, inputs, and environment. Strata reads each cell's AST to build the dependency graph automatically, so re-running a notebook is mostly a series of cache hits. Touch one cell and the cascade re-executes only the cells that depend on it. Identical inputs produce the same artifact whether the second run comes a minute later or a year later, on the same machine or a different one.

Prompt cells make AI calls first-class DAG nodes, cached by template, inputs, and model config. # @worker gpu-fly dispatches a cell to a remote GPU. # @mount data s3://bucket/prefix ro makes an S3 prefix available as a local pathlib.Path inside the cell. The whole notebook is plain .py files plus a manifest, so commits are git-diffable and there are no JSON blobs or execution metadata bleeding into the history.

Docs: bearing-research.github.io/strata

Quick Start

Both paths below run in personal mode: single-user, writes enabled, no proxy auth. For multi-tenant or hosted deployments, see Deployment Modes.

# Docker. docker-compose.yml sets personal mode for you.
docker compose up -d --build
# Then open http://localhost:8765

# Or via PyPI install (recommended) — requires uv. Drops the wheel into
# a uv-managed tool env at ~/.local/share/uv/tools/strata-notebook with
# the CLI on PATH.
uv tool install strata-notebook
strata-notebook
# Then open http://localhost:8765

For the full inventory of installed commands (strata-notebook, strata, strata-worker, python -m strata), see the Commands reference.

Source builds — git clone + uv sync — work too and are documented in Installation; needed only if you're modifying Strata itself.

Requirements

  • uv ≥ 0.8 — install via the uv installer (curl -LsSf https://astral.sh/uv/install.sh | sh on macOS/Linux; PowerShell installer on Windows). Strata refuses to start outside a uv-managed environment: the startup check looks for the uv = <version> marker that uv writes to pyvenv.cfg. uv tool install, uv add, and uv run all produce envs with this marker; plain pip install into a hand-rolled python -m venv does not, and Strata will refuse to start there. Conda and pip-venv users need to install uv and re-launch from a uv-managed env — existing data and other environments are untouched. uv fetches a matching Python for you, so you don't need Python pre-installed.

Source build (only if you're building Strata itself from a git clone, not using PyPI or Docker):

  • Rust toolchain (rustup) — for maturin to compile the native extension. PyPI wheels skip this step.
  • Node 24+ / npm — for the frontend npm ci && npm run build step. PyPI wheels bundle the prebuilt SPA.
  • Python 3.12+ is handled automatically by uv sync.

Windows: PyPI install works directly. Source builds work via WSL2 (smoother) or native Windows (uv + rustup + Node have Windows installers).

Why uv at runtime: the notebook subsystem shells out to uv to manage per-notebook .venv/ directories, and the project's dev workflow assumes uv as the install path. Failing fast at startup with a clear message beats a confusing subprocess error later.

Notebook Features

  • Content-addressed caching. Same code plus same inputs equals an instant cache hit, zero recomputation.
  • Automatic dependency tracking. DAG built from variable analysis, no manual wiring.
  • Cascade execution. Change upstream code, downstream cells auto-invalidate.
  • Distributed workers. Annotate @worker gpu-fly and the cell dispatches to a remote GPU.
  • Prompt cells. LLM-powered cells with {{ variable }} template injection.
  • SQL cells. First-class SQL cells with # @sql connection=<name>, named-bind parameters, and DuckDB / SQLite / PostgreSQL / Snowflake / BigQuery drivers.
  • AI assistant. Streaming chat with conversation memory, agent mode for autonomous notebook building.
  • Environment management. Per-notebook Python venvs via uv, isolated from each other.
  • Rich outputs. DataFrames, matplotlib plots, markdown, images.
  • Cell operations. Reorder, duplicate, fold, keyboard shortcuts.
  • Headless runner. strata run ./my-notebook for CI and scheduled execution.

The Cache Advantage

Every notebook platform re-executes from scratch when you change one cell. Strata doesn't. The artifact store deduplicates by provenance hash. If the code and inputs haven't changed, the result is served instantly.

First run:     load data (10s) → clean (3s) → train (20s) → evaluate (1s)  = 34s
Change model:  load data (✓)   → clean (✓)  → train (20s) → evaluate (1s)  = 21s
Re-run:        load data (✓)   → clean (✓)  → train (✓)   → evaluate (✓)   = <1s

This isn't a feature bolted on. It's the architecture. Every cell execution is a materialize(inputs, transform, environment) → artifact operation, and the cache is correct by construction because it's keyed on content, not time.

Distributed Execution

Each cell can declare which worker it runs on via a single annotation:

# @worker my-gpu
embeddings = model.encode(abstracts, batch_size=256)

You define workers in notebook.toml. Each one points at an HTTP endpoint that implements the Strata executor protocol. A worker can be a GPU box on RunPod, a DataFusion cluster on Fly, a beefy EC2 instance, or anything else that speaks HTTP. The notebook routes the cell to the declared worker at execution time, and the UI shows a live "dispatching to my-gpu" badge while it runs.

No deployment code, no infrastructure glue. Bring your own compute, one annotation per cell.

Source Annotations

Every piece of per-cell metadata is a comment directive in the cell's source. The source is the single canonical place for cell config: annotations always win over any stored defaults.

# @name Extract embeddings
# @worker gpu-fly
# @timeout 600
# @env MODEL_PATH=/models/bge-large
# @mount dataset s3://corpus/2024-q4 ro
embeddings = model.encode(dataset / "abstracts.jsonl")

Diagnostics fire on open, reload, and after an edit settles: worker_unknown, mount_uri_unsupported, mount_shadows_notebook, timeout_not_numeric, env_malformed. They surface as a pill in the cell header and log structured warnings for headless runs.

Mounts

Mounts bind a remote URI to a local path inside the cell. Supported schemes: file://, s3://, gs://, az://. Credentials flow through fsspec options: set anon = true for public buckets, or drop it to use the standard credential chain.

[[mounts]]
name = "taxi_zones"
uri = "s3://nyc-tlc/misc"
mode = "ro"
options = { anon = true }

Inside the cell, taxi_zones is a pathlib.Path. Strata materializes it on first read and caches the bytes locally for the session.

Examples

Example What it shows
pandas_basics Linear DataFrame chain, caching, staleness propagation
iris_classification End-to-end ML, DAG branching, mixed output types
titanic_ml Feature engineering + model comparison
s3_mount Reading a public S3 bucket via a mount
arxiv_classifier Distributed execution via @worker + Modal GPU + Fly cluster
markdown_showcase Markdown cells, dynamic Markdown(...) outputs, security cases
library_cells Cross-cell library code: pure module cells, mixed runtime+library cells, the limits
news_alpha_trader Multi-stage trading pipeline with prompt cells and structured LLM outputs

Known rough edges

Strata is at 0.1 and a few surfaces are explicitly exploratory. The core (materialization, artifact store, DAG, caching, headless run) is stable in the alpha sense; these are the bits where the API or coverage is still moving:

  • Prompt-cell API. Streaming, conversation memory, and structured-output validation are not yet finalized — expect breaking changes in 0.x.
  • SQL cell cloud drivers. DuckDB / SQLite / PostgreSQL are exercised in CI. BigQuery and Snowflake adapters ship but lack integration test coverage; pin a Strata version in production until that lands. MotherDuck and MySQL are planned but not yet implemented.
  • Wire / on-disk formats. notebook.toml, runtime.json, and the artifact cache layout may change between minor versions during 0.x. Rely on the Python API surface, not the file shapes.

Library usage

Strata's HTTP API exposes the materialization layer directly, driveable from Python via StrataClient. Useful for direct table scans, custom transforms, and headless workflows; the notebook executor is a separate pipeline that writes to the same artifact store. The client talks to a running Strata server, so this workflow has two steps: start the server, then call it from your code.

# 1. Install + start the server (in a uv-managed env).
uv tool install strata-notebook
strata-notebook

# 2. From another process, point the client at it:
from strata import StrataClient

client = StrataClient(base_url="http://localhost:8765")
artifact = client.materialize(
    inputs=["file:///warehouse#db.events"],
    transform={"executor": "scan@v1", "params": {"columns": ["id", "value"]}},
)
table = client.fetch(artifact.uri)  # Arrow table, cached by provenance

The server provides: provenance-based deduplication, immutable versioned artifacts, lineage tracking, Iceberg table scanning with row-group caching, pluggable blob storage (local/S3/GCS/Azure), multi-tenancy, trusted-proxy auth, and an executor protocol for external compute.

Library docs →


Architecture

┌─────────────────────────────────────────────┐
│ Notebook UI (Vue.js + WebSocket)            │
│ cells, DAG view, AI assistant, workers      │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Notebook Backend (FastAPI)                  │
│ session, cascade, executor, prompt cells    │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Strata Core                                 │
│ materialize, artifacts, lineage, dedupe     │
└─────────────────────────────────────────────┘

The notebook is an orchestration layer over Core. It decides what to run next (cascade planning, staleness tracking). The cell harness is an executor. Core decides whether results already exist and persists them.

Development

uv sync                                # Install deps + build Rust extension
uv run pytest                          # Run all tests
uv run pre-commit run --all-files      # Lint + format
cd frontend && npm run dev             # Frontend dev server (hot reload)

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strata_notebook-0.1.0.tar.gz (953.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

strata_notebook-0.1.0-cp312-abi3-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.12+Windows x86-64

strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ ARM64

strata_notebook-0.1.0-cp312-abi3-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

strata_notebook-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file strata_notebook-0.1.0.tar.gz.

File metadata

  • Download URL: strata_notebook-0.1.0.tar.gz
  • Upload date:
  • Size: 953.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for strata_notebook-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1d408532660eb0f6a4b48d15dd4bdf7c54bbc487629e5444afdfcf978ad46f22
MD5 f4c93044d9dead79cf8b850ef9415966
BLAKE2b-256 553196cff6bb83819d641051387f4d4074b90b33775b960ba2685d75e89d7af2

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0.tar.gz:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c82824b2904a8dbcba0cf3bc0e9d103b2431443718fce6196e5f8645413da591
MD5 5d446c79bc99322c8d7666134e925021
BLAKE2b-256 510884887b389cb5316a24b76f8b83f1cf29bd0ebf462631ac7098e0ca41eb98

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0-cp312-abi3-win_amd64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 31e56df3acda59373803eafa9e2b8ed0b9db1c60b24c380e6d91cda452aa7ad6
MD5 baf2301bc509b9d296cc5619cc35dfba
BLAKE2b-256 3a305d76363686cdcf4436feb985d8f8f48ff302f16e2f91c51597f915fac582

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7d7158fb2a19c9dc4dfd24fb395f3a876aa7f84274cde49798e7b045ab602201
MD5 ae90012f49e355cb7efb912b42f10548
BLAKE2b-256 7c07b0d80a7ca1a2ab457c6adaa007207f2560b29c72b945f4b3bb30ab8236c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b38d49e7626471e001901c69d3ed1a3da24ecc29e7f82808460f7eb2b3ac6ae7
MD5 31a57575706efc124e0322f8eecfbc7c
BLAKE2b-256 30b27e412e19ec9dff12b0a07784c66336b66d1649fc123e62c562167b43d427

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7393580dcc95874e8f45b4301e1c7443a2505aa767baccea26a5ea4b19c4e347
MD5 c9846e2e309f859b55733d061c15c0de
BLAKE2b-256 b9fd41e9c96c4bf28eb71cbb176beda6a8e72b63e5ac048a2cf4c660f5626a51

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page