Skip to main content

A content-addressed computation graph with an interactive notebook UI

Project description

Strata

CI Pre-commit Docker

Strata is a content-addressed computation graph with an interactive notebook UI.

Every cell output is a versioned artifact keyed by its provenance: source, inputs, and environment. Strata reads each cell's AST to build the dependency graph automatically, so re-running a notebook is mostly a series of cache hits. Touch one cell and the cascade re-executes only the cells that depend on it. Identical inputs produce the same artifact whether the second run comes a minute later or a year later, on the same machine or a different one.

Prompt cells make AI calls first-class DAG nodes, cached by template, inputs, and model config. # @worker gpu-fly dispatches a cell to a remote GPU. # @mount data s3://bucket/prefix ro makes an S3 prefix available as a local pathlib.Path inside the cell. The whole notebook is plain .py files plus a manifest, so commits are git-diffable and there are no JSON blobs or execution metadata bleeding into the history.

Docs: bearing-research.github.io/strata

Quick Start

Both paths below run in personal mode: single-user, writes enabled, no proxy auth. For multi-tenant or hosted deployments, see Deployment Modes.

# Docker (recommended). docker-compose.yml sets personal mode for you.
docker compose up -d --build
# Then open http://localhost:8765

# Or from source — requires uv (see Requirements below).
uv sync
cd frontend && npm ci && npm run build && cd ..
uv run strata-server
# Then open http://localhost:8765
#
# For the full inventory of installed commands
# (strata-server, strata, strata-worker, python -m strata),
# see docs/getting-started/installation.md#commands-reference.

Requirements

Runtime (Docker or uv run strata-server):

  • uv ≥ 0.8 — install via the uv installer (curl -LsSf https://astral.sh/uv/install.sh | sh on macOS/Linux; PowerShell installer on Windows). Strata refuses to start outside a uv-managed environment: the startup check looks for the uv = <version> marker that uv writes to pyvenv.cfg; uv run and uvx produce envs with this marker, hand-rolled python -m venv venvs do not. Conda and pip-venv users need to install uv and re-launch Strata from a uv-managed env — existing data and other environments are untouched, but Strata's own runtime has to be uv-managed. uv fetches a matching Python for you, so you don't need Python pre-installed.

Source build (this is currently the only install path — strata-notebook isn't on PyPI yet; PyPI wheels are planned for 0.1.0 and will let you skip the Rust step):

  • Rust toolchain (rustup) — for maturin to compile the native extension. Once installed, cargo and rustc need to be on PATH so uv sync can invoke them.
  • Node 25+ / npm — for the frontend npm ci && npm run build step.
  • Python 3.12+ is handled automatically by uv sync.

Windows: source builds work via WSL2 (smoother) or native Windows (uv + rustup + Node have Windows installers). Day-to-day dev is on macOS/Linux, so WSL2 is the better-trodden path.

Why uv at runtime: the notebook subsystem shells out to uv to manage per-notebook .venv/ directories, and the project's dev workflow assumes uv as the install path. Failing fast at startup with a clear message beats a confusing subprocess error later.

Notebook Features

  • Content-addressed caching. Same code plus same inputs equals an instant cache hit, zero recomputation.
  • Automatic dependency tracking. DAG built from variable analysis, no manual wiring.
  • Cascade execution. Change upstream code, downstream cells auto-invalidate.
  • Distributed workers. Annotate @worker gpu-fly and the cell dispatches to a remote GPU.
  • Prompt cells. LLM-powered cells with {{ variable }} template injection.
  • SQL cells. First-class SQL cells with # @sql connection=<name>, named-bind parameters, and DuckDB / SQLite / PostgreSQL / Snowflake / BigQuery drivers.
  • AI assistant. Streaming chat with conversation memory, agent mode for autonomous notebook building.
  • Environment management. Per-notebook Python venvs via uv, isolated from each other.
  • Rich outputs. DataFrames, matplotlib plots, markdown, images.
  • Cell operations. Reorder, duplicate, fold, keyboard shortcuts.
  • Headless runner. strata run ./my-notebook for CI and scheduled execution.

The Cache Advantage

Every notebook platform re-executes from scratch when you change one cell. Strata doesn't. The artifact store deduplicates by provenance hash. If the code and inputs haven't changed, the result is served instantly.

First run:     load data (10s) → clean (3s) → train (20s) → evaluate (1s)  = 34s
Change model:  load data (✓)   → clean (✓)  → train (20s) → evaluate (1s)  = 21s
Re-run:        load data (✓)   → clean (✓)  → train (✓)   → evaluate (✓)   = <1s

This isn't a feature bolted on. It's the architecture. Every cell execution is a materialize(inputs, transform, environment) → artifact operation, and the cache is correct by construction because it's keyed on content, not time.

Distributed Execution

Each cell can declare which worker it runs on via a single annotation:

# @worker my-gpu
embeddings = model.encode(abstracts, batch_size=256)

You define workers in notebook.toml. Each one points at an HTTP endpoint that implements the Strata executor protocol. A worker can be a GPU box on RunPod, a DataFusion cluster on Fly, a beefy EC2 instance, or anything else that speaks HTTP. The notebook routes the cell to the declared worker at execution time, and the UI shows a live "dispatching to my-gpu" badge while it runs.

No deployment code, no infrastructure glue. Bring your own compute, one annotation per cell.

Source Annotations

Every piece of per-cell metadata is a comment directive in the cell's source. The source is the single canonical place for cell config: annotations always win over any stored defaults.

# @name Extract embeddings
# @worker gpu-fly
# @timeout 600
# @env MODEL_PATH=/models/bge-large
# @mount dataset s3://corpus/2024-q4 ro
embeddings = model.encode(dataset / "abstracts.jsonl")

Diagnostics fire on open, reload, and after an edit settles: worker_unknown, mount_uri_unsupported, mount_shadows_notebook, timeout_not_numeric, env_malformed. They surface as a pill in the cell header and log structured warnings for headless runs.

Mounts

Mounts bind a remote URI to a local path inside the cell. Supported schemes: file://, s3://, gs://, az://. Credentials flow through fsspec options: set anon = true for public buckets, or drop it to use the standard credential chain.

[[mounts]]
name = "taxi_zones"
uri = "s3://nyc-tlc/misc"
mode = "ro"
options = { anon = true }

Inside the cell, taxi_zones is a pathlib.Path. Strata materializes it on first read and caches the bytes locally for the session.

Examples

Example What it shows
pandas_basics Linear DataFrame chain, caching, staleness propagation
iris_classification End-to-end ML, DAG branching, mixed output types
titanic_ml Feature engineering + model comparison
s3_mount Reading a public S3 bucket via a mount
arxiv_classifier Distributed execution via @worker + Modal GPU + Fly cluster
markdown_showcase Markdown cells, dynamic Markdown(...) outputs, security cases
library_cells Cross-cell library code: pure module cells, mixed runtime+library cells, the limits
news_alpha_trader Multi-stage trading pipeline with prompt cells and structured LLM outputs

Known rough edges

Strata is at 0.1 and a few surfaces are explicitly exploratory. The core (materialization, artifact store, DAG, caching, headless run) is stable in the alpha sense; these are the bits where the API or coverage is still moving:

  • Prompt-cell API. Streaming, conversation memory, and structured-output validation are not yet finalized — expect breaking changes in 0.x.
  • SQL cell cloud drivers. DuckDB / SQLite / PostgreSQL are exercised in CI. BigQuery and Snowflake adapters ship but lack integration test coverage; pin a Strata version in production until that lands. MotherDuck and MySQL are planned but not yet implemented.
  • Wire / on-disk formats. notebook.toml, runtime.json, and the artifact cache layout may change between minor versions during 0.x. Rely on the Python API surface, not the file shapes.

Library usage

Strata's HTTP API exposes the materialization layer directly, driveable from Python via StrataClient. Useful for direct table scans, custom transforms, and headless workflows; the notebook executor is a separate pipeline that writes to the same artifact store. The client talks to a running Strata server, so this workflow has two steps: start the server, then call it from your code.

# 1. Install + start the server (in a uv-managed env).
# Until 0.1.0 ships to PyPI, install from a git checkout — needs
# the Rust toolchain (see Requirements above).
git clone https://github.com/bearing-research/strata.git
cd strata
uv sync --all-extras
uv run strata-server

# 2. From another process, point the client at it:
from strata import StrataClient

client = StrataClient(base_url="http://localhost:8765")
artifact = client.materialize(
    inputs=["file:///warehouse#db.events"],
    transform={"executor": "scan@v1", "params": {"columns": ["id", "value"]}},
)
table = client.fetch(artifact.uri)  # Arrow table, cached by provenance

The server provides: provenance-based deduplication, immutable versioned artifacts, lineage tracking, Iceberg table scanning with row-group caching, pluggable blob storage (local/S3/GCS/Azure), multi-tenancy, trusted-proxy auth, and an executor protocol for external compute.

Library docs →


Architecture

┌─────────────────────────────────────────────┐
│ Notebook UI (Vue.js + WebSocket)            │
│ cells, DAG view, AI assistant, workers      │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Notebook Backend (FastAPI)                  │
│ session, cascade, executor, prompt cells    │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Strata Core                                 │
│ materialize, artifacts, lineage, dedupe     │
└─────────────────────────────────────────────┘

The notebook is an orchestration layer over Core. It decides what to run next (cascade planning, staleness tracking). The cell harness is an executor. Core decides whether results already exist and persists them.

Development

uv sync                                # Install deps + build Rust extension
uv run pytest                          # Run all tests
uv run pre-commit run --all-files      # Lint + format
cd frontend && npm run dev             # Frontend dev server (hot reload)

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strata_notebook-0.1.0a2.tar.gz (951.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.12+Windows x86-64

strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ ARM64

strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file strata_notebook-0.1.0a2.tar.gz.

File metadata

  • Download URL: strata_notebook-0.1.0a2.tar.gz
  • Upload date:
  • Size: 951.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for strata_notebook-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 20b9c03d146b84224e7cba459723f1dc8ae9a131a839d985a1162f5b1ad3aad5
MD5 7ec915bc1978eaa64cab8a81ab4319ef
BLAKE2b-256 e46327be88794c107f8e7c358b926628fe61f5f76fcd7eaf5fa0b83c77943d1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0a2.tar.gz:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 13439e2c2db1fa9d64f46a679736dc14938749ac1f9ead8a11999409d814ba54
MD5 dcb981ff3e20ae8e264235562c7df9e6
BLAKE2b-256 f2f02ecea74d03992cfd7f456099bc24007772b2777f259f59c8ba03d538db94

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-win_amd64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5430261774cecc4df9b036e3bf84c486e413e38752678ccdade5b79da37ce64a
MD5 22d401029b26dc13e83de24deb6e2af8
BLAKE2b-256 87510a112b99303bf46ff7be9e3ed4a62450db7e0065a23dfddd6f4deb889b7e

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2a2af46c8caa1dd6426cfd976e9ee96f02d61eca3b213d56790a512cc42bb810
MD5 3e523d5c68a5d2edcd66fccd038635cc
BLAKE2b-256 91caa02adcf6524e148bdb6bca5b275c80701b9868eca927ef6f11e752b8071f

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f71e6274c9501d9e4582aa157fc7a74baf5b2f595212a96557702a51425916cb
MD5 06137e86ee468a1fac4ecbea9e580350
BLAKE2b-256 120eab242998218bd870bd57abc5e71a7d360f3571e39e4f7c731d748219b5ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 771e2a784204b575371ff723716292c7d64d0c93490fbf6288d1692e9f555a78
MD5 f2f21f4a477632e9c8892a76eb8f7682
BLAKE2b-256 13311e153e9699849fc4f54adc5b527c8ae14e889b5434fa5caf6e938748d378

See more details on using hashes here.

Provenance

The following attestation bundles were made for strata_notebook-0.1.0a2-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on bearing-research/strata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page