Skip to main content

Self-hosted decision server for the dbt-state client, with pluggable state storage.

Project description

dbt-state-oss

An open-source, self-hosted decision server for the Apache-2.0 dbt-state client, keeping the state store in your own storage (local disk, S3, or Azure Blob) instead of dbt Labs' hosted, metered service.

Why

dbt-state skips redundant model executions ("NO-OP" on a second run) and auto-defers to prod, without a manifest. But the decision engine is a hosted, metered gRPC service (api.state.dbt.com); the pip package is only a client. With no auth, the client silently disables itself and dbt runs vanilla.

The client, the protobuf protocol, and the shared libs are all Apache-2.0. Only the server is closed. This project builds an open replacement server that:

  • speaks the same gRPC protocol (reuses the client's *Servicer stubs),
  • keeps all state in your own storage (local disk, S3, or Azure Blob),
  • needs no dbt Labs account (insecure channel for dev; your own OAuth/Entra ID for prod).

How the client/server split works (verified against the wheel)

  • Client (unchanged, Apache-2.0): compiles model SQL, extracts deps + table refs (sqlglot), reads each input's last_modified from the warehouse via an adapter extension, hashes seed files, ships raw SQL + metadata over gRPC, acts on the verdict, and reports outcomes back.
  • Server (this repo): computes a semantic fingerprint, matches it against stored history for the target table, checks freshness + execution_type, and returns skip / clone / execute. Persists run records to your chosen backend (local, S3, or Azure Blob).

Our fingerprint algorithm only has to be self-consistent between "record a run" and "check a run" - it does not need to match dbt Labs'.

Auth

  • Dev / trusted network: RUN_CACHE_API_URL=localhost:50051 (non-:443) or RUN_CACHE_API_SECURE=false -> insecure channel, zero OAuth. In CI/non-interactive, set RUN_CACHE_OAUTH_CLIENT_SECRET=<dummy> to pass the client's disable-gate (presence-checked only; never used on an insecure channel).
  • Production: TLS + override RUN_CACHE_AUTH_URL/RUN_CACHE_TOKEN_URL to your own IdP (e.g. Azure Entra ID, same identity that guards your storage). Client does OAuth2 and attaches a bearer token; the server validates the JWT.

Repo layout

(The pip package ships only dbt_state_oss/; the rest is for development.)

dbt_state_oss/   the gRPC decision server (the engine)
example_project/ a tiny dbt-postgres project (seed -> staging -> mart) for local testing
tests/           unit + S3 integration tests
docs/            PROTOCOL.md (the reverse-engineered contract), FINDINGS.md (the eval)
reference/       local copy of dbt-labs' Apache-2.0 client source (gitignored, not committed)

Status

v1 works — postgres warehouse; local, s3, and azure state stores. Verified end-to-end against our own server with zero dbt Labs:

scenario result
second run, nothing changed all models NO-OP (reused, no SQL run)
comment / whitespace-only edit NO-OP (semantic fingerprint)
real SQL change to a model that model rebuilds
real change upstream downstream rebuilds too (freshness check, cache stays safe)
seed file unchanged seed NO-OP (via values_hash)

Requires postgres track_commit_timestamp=on (the client reads freshness from pg_xact_commit_timestamp); the local docker postgres sets it.

State backends

Pick the backend with --store (or the STATE_STORE env var). Each backend's config takes a CLI flag that falls back to its env var. All backends implement the same two-method StateStore interface, so the roadmap entries are additive.

backend status flags env
local supported --dir DBTSTATE_LOCAL_DIR
s3 supported --bucket, --prefix DBTSTATE_S3_BUCKET, DBTSTATE_S3_PREFIX
azure supported --account, --container, --prefix DBTSTATE_AZURE_ACCOUNT, DBTSTATE_AZURE_CONTAINER, DBTSTATE_AZURE_PREFIX
memory dev/test only - -
dbt-state-oss --store s3    --bucket my-bucket
dbt-state-oss --store azure --account acct --container dbt-state
dbt-state-oss --store local --dir ./.state_data

Roadmap (not yet implemented):

  • Google Cloud Storage (gcs)
  • Fabric OneLake files

Azure auth: DefaultAzureCredential (az login locally, OIDC/workload-identity in CI, managed identity on Azure). The identity needs the Storage Blob Data Contributor role on the account (control-plane Owner/Contributor is NOT enough):

az role assignment create --assignee-object-id <your-oid> --assignee-principal-type User \
  --role "Storage Blob Data Contributor" \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<acct>

S3 auth: the boto3 default credential chain (IAM role, instance profile, SSO, AWS_* env vars, or ~/.aws/credentials). No keys are read from this repo. The identity needs read/write on the bucket; region comes from your standard AWS configuration. After pip install "dbt-state-oss[s3]", start the server with dbt-state-oss --store s3 --bucket <bucket>.

Next milestones: GCS / OneLake backends -> fabricspark adapter extension -> clone + prod auth.

Install & run

pip install dbt-state-oss          # add [s3] or [azure] for those backends
dbt-state-oss --store local --port 50051

Then point your dbt-state client at the server (client env vars use the RUN_CACHE_ prefix):

export RUN_CACHE_API_URL=localhost:50051 RUN_CACHE_API_SECURE=false RUN_CACHE_OAUTH_CLIENT_SECRET=dev
dbt build      # in your dbt project; run twice and the second run NO-OPs

RUN_CACHE_API_SECURE=false selects an insecure channel (no OAuth); RUN_CACHE_OAUTH_CLIENT_SECRET only needs to be present to pass the client's enable-gate in non-interactive runs. Switch backends with --store (see the table above), e.g. dbt-state-oss --store azure --account <acct> after az login.

The NO-OP demo (from a clone)

A runnable seed -> staging -> mart project that NO-OPs on the second run lives in example_project/. It ships only in the repo (not the pip package) and needs a postgres with track_commit_timestamp=on — the client reads freshness from pg_xact_commit_timestamp. The example profile expects postgres on :5433, database dbt_oss. Clone the repo, install with the dev extra, start the server (--store local), then dbt build --target prod twice from example_project/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_state_oss-0.1.0.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_state_oss-0.1.0-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file dbt_state_oss-0.1.0.tar.gz.

File metadata

  • Download URL: dbt_state_oss-0.1.0.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dbt_state_oss-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2742a53de9db68fa48ecc8d1bbbb041a4fd56bfde835bdd4ab3c6937afe31d09
MD5 6fbf44ab47c6d2147713b5b16301db0f
BLAKE2b-256 e2ae07be16f82e8b735f9d8701cd7da18a21f2543bbac10855a5f9c837c82896

See more details on using hashes here.

File details

Details for the file dbt_state_oss-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_state_oss-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dbt_state_oss-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86db1a5756da1c1a0a08080f554c9e9e3ca4ae9ad794cef4f50a45e0153abc41
MD5 90ade124ea4a7f5092f82579ec69c3d3
BLAKE2b-256 58e8da7c6fecc12a00107d2c9a742774dec68b0067affb7a1e3dec3627d3caf0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page