Self-hosted decision server for the dbt-state client, with pluggable state storage.
Project description
dbt-state-oss
An open-source, self-hosted decision server for the Apache-2.0
dbt-state client, keeping the state
store in your own storage (local disk, S3, or Azure Blob) instead of dbt
Labs' hosted, metered service.
Why
dbt-state skips redundant model executions ("NO-OP" on a second run) and
auto-defers to prod, without a manifest. But the decision engine is a hosted,
metered gRPC service (api.state.dbt.com); the pip package is only a client.
With no auth, the client silently disables itself and dbt runs vanilla.
The client, the protobuf protocol, and the shared libs are all Apache-2.0. Only the server is closed. This project builds an open replacement server that:
- speaks the same gRPC protocol (reuses the client's
*Servicerstubs), - keeps all state in your own storage (local disk, S3, or Azure Blob),
- needs no dbt Labs account (insecure channel for dev; your own OAuth/Entra ID for prod).
How the client/server split works (verified against the wheel)
- Client (unchanged, Apache-2.0): compiles model SQL, extracts deps + table
refs (sqlglot), reads each input's
last_modifiedfrom the warehouse via an adapter extension, hashes seed files, ships raw SQL + metadata over gRPC, acts on the verdict, and reports outcomes back. - Server (this repo): computes a semantic fingerprint, matches it against stored history for the target table, checks freshness + execution_type, and returns skip / clone / execute. Persists run records to your chosen backend (local, S3, or Azure Blob).
Our fingerprint algorithm only has to be self-consistent between "record a run" and "check a run" - it does not need to match dbt Labs'.
Auth
- Dev / trusted network:
RUN_CACHE_API_URL=localhost:50051(non-:443) orRUN_CACHE_API_SECURE=false-> insecure channel, zero OAuth. In CI/non-interactive, setRUN_CACHE_OAUTH_CLIENT_SECRET=<dummy>to pass the client's disable-gate (presence-checked only; never used on an insecure channel). - Production: TLS + override
RUN_CACHE_AUTH_URL/RUN_CACHE_TOKEN_URLto your own IdP (e.g. Azure Entra ID, same identity that guards your storage). Client does OAuth2 and attaches a bearer token; the server validates the JWT.
Repo layout
(The pip package ships only dbt_state_oss/; the rest is for development.)
dbt_state_oss/ the gRPC decision server (the engine)
example_project/ a tiny dbt-postgres project (seed -> staging -> mart) for local testing
tests/ unit + S3 integration tests
docs/ PROTOCOL.md (the reverse-engineered contract), FINDINGS.md (the eval)
reference/ local copy of dbt-labs' Apache-2.0 client source (gitignored, not committed)
Status
v1 works — postgres warehouse; local, s3, and azure state stores.
Verified end-to-end against our own server with zero dbt Labs:
| scenario | result |
|---|---|
| second run, nothing changed | all models NO-OP (reused, no SQL run) |
| comment / whitespace-only edit | NO-OP (semantic fingerprint) |
| real SQL change to a model | that model rebuilds |
| real change upstream | downstream rebuilds too (freshness check, cache stays safe) |
| seed file unchanged | seed NO-OP (via values_hash) |
Requires postgres track_commit_timestamp=on (the client reads freshness from
pg_xact_commit_timestamp); the local docker postgres sets it.
State backends
Pick the backend with --store (or the STATE_STORE env var). Each backend's
config takes a CLI flag that falls back to its env var. All backends implement
the same two-method StateStore interface, so the roadmap entries are additive.
| backend | status | flags | env |
|---|---|---|---|
local |
supported | --dir |
DBTSTATE_LOCAL_DIR |
s3 |
supported | --bucket, --prefix |
DBTSTATE_S3_BUCKET, DBTSTATE_S3_PREFIX |
azure |
supported | --account, --container, --prefix |
DBTSTATE_AZURE_ACCOUNT, DBTSTATE_AZURE_CONTAINER, DBTSTATE_AZURE_PREFIX |
memory |
dev/test only | - | - |
dbt-state-oss --store s3 --bucket my-bucket
dbt-state-oss --store azure --account acct --container dbt-state
dbt-state-oss --store local --dir ./.state_data
Roadmap (not yet implemented):
- Google Cloud Storage (
gcs) - Fabric OneLake files
Azure auth: DefaultAzureCredential (az login locally, OIDC/workload-identity
in CI, managed identity on Azure). The identity needs the Storage Blob Data
Contributor role on the account (control-plane Owner/Contributor is NOT enough):
az role assignment create --assignee-object-id <your-oid> --assignee-principal-type User \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<acct>
S3 auth: the boto3 default credential chain (IAM role, instance profile, SSO,
AWS_* env vars, or ~/.aws/credentials). No keys are read from this repo. The
identity needs read/write on the bucket; region comes from your standard AWS
configuration. After pip install "dbt-state-oss[s3]", start the server with
dbt-state-oss --store s3 --bucket <bucket>.
Next milestones: GCS / OneLake backends -> fabricspark adapter extension -> clone + prod auth.
Install & run
pip install dbt-state-oss # add [s3] or [azure] for those backends
dbt-state-oss --store local --port 50051
Then point your dbt-state client at the server (client env vars use the
RUN_CACHE_ prefix):
export RUN_CACHE_API_URL=localhost:50051 RUN_CACHE_API_SECURE=false RUN_CACHE_OAUTH_CLIENT_SECRET=dev
dbt build # in your dbt project; run twice and the second run NO-OPs
RUN_CACHE_API_SECURE=false selects an insecure channel (no OAuth);
RUN_CACHE_OAUTH_CLIENT_SECRET only needs to be present to pass the client's
enable-gate in non-interactive runs. Switch backends with --store (see the
table above), e.g. dbt-state-oss --store azure --account <acct> after az login.
The NO-OP demo (from a clone)
A runnable seed -> staging -> mart project that NO-OPs on the second run lives in
example_project/. It ships only in the repo (not the pip package) and needs a
postgres with track_commit_timestamp=on — the client reads freshness from
pg_xact_commit_timestamp. The example profile expects postgres on :5433,
database dbt_oss. Clone the repo, install with the dev extra, start the
server (--store local), then dbt build --target prod twice from
example_project/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_state_oss-0.1.0.tar.gz.
File metadata
- Download URL: dbt_state_oss-0.1.0.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2742a53de9db68fa48ecc8d1bbbb041a4fd56bfde835bdd4ab3c6937afe31d09
|
|
| MD5 |
6fbf44ab47c6d2147713b5b16301db0f
|
|
| BLAKE2b-256 |
e2ae07be16f82e8b735f9d8701cd7da18a21f2543bbac10855a5f9c837c82896
|
File details
Details for the file dbt_state_oss-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dbt_state_oss-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86db1a5756da1c1a0a08080f554c9e9e3ca4ae9ad794cef4f50a45e0153abc41
|
|
| MD5 |
90ade124ea4a7f5092f82579ec69c3d3
|
|
| BLAKE2b-256 |
58e8da7c6fecc12a00107d2c9a742774dec68b0067affb7a1e3dec3627d3caf0
|