Declarative schema management, governance, and migration for Databricks Unity Catalog. Version control schemas, generate SQL, deploy across environments.
Project description
SchemaX Python SDK & CLI
Declarative schema management and migration for Databricks Unity Catalog. Version control your catalog structure, generate SQL migrations, and deploy consistently across environments — from a single developer workflow through CI/CD.
What is schema management?
Schema management is defining and maintaining the structure of your data catalog — catalogs, schemas, tables, views, columns, constraints, and their metadata. In Unity Catalog, this is the layer that determines what exists and who can access it.
Schema migration is applying changes to that structure over time in a controlled way: add a column, create a table, change grants — and track those changes so they can be reviewed, versioned, and deployed per environment (dev → test → prod).
Without it, schema changes are ad hoc, hard to audit, and risky when promoting to production. With it, changes are declarative, stored in Git, and applied consistently via CI/CD or manual deployment.
Works alongside Spark and Lakeflow Declarative Pipelines
If your tables are created by Spark jobs or DLT pipelines, SchemaX complements that workflow. Use governance-only mode to version and deploy comments, tags, grants, row filters, and column masks on existing objects — without touching CREATE TABLE statements. Your pipelines handle the data; SchemaX handles the governance layer.
Features
Full Unity Catalog object support
- Catalogs and schemas — create, rename, update managed locations, comments, tags, grants
- Tables — managed and external (Delta, Iceberg), partitioning, liquid clustering, column mapping
- Views — definitions, dependency tracking with automatic SQL extraction via sqlglot
- Materialized views — definitions, refresh schedules, dependency ordering
- Volumes — managed and external with storage locations
- Functions — SQL and Python UDFs, table functions, parameters with types and defaults
- Columns — add, rename, drop, reorder, change types, nullability, comments, tags
- Constraints — primary keys, foreign keys, check constraints (with NOT ENFORCED / RELY)
Data governance
- Grants — GRANT and REVOKE on all securable types
- Tags — governance tags on catalogs, schemas, tables, views, and columns
- Row filters — row-level security policies
- Column masks — column-level data masking
- Table and view properties — TBLPROPERTIES configuration
- Governance-only mode — deploy only governance DDL, skip CREATE for pipeline-managed objects
Deployment and CI/CD
- Multi-environment — logical catalog names mapped to physical names per environment
- Apply with tracking — deploy via Databricks Statement Execution API with database-backed audit trail
- Rollback — partial (revert a failed deployment) and complete (to a snapshot), with safety classification
- Dry run — preview SQL without executing
- Auto-rollback — automatically revert on partial failure
- Deployment scope — governance-only mode, existing-object awareness
- CI/CD templates — GitHub Actions, GitLab CI, Azure DevOps
- Databricks Asset Bundles — generate DAB resource YAML with
schemax bundle
Version control
- Snapshots — semantic versioned state captures
- Changelogs — every change tracked as a typed operation
- State diffing — compute minimal operations between any two snapshots
- Dependency-ordered SQL — correct creation ordering for views and materialized views
- Stale detection — detect and rebase snapshots after Git rebases
Installation
pip install schemaxpy
Quick start
1. Initialize a project
schemax init
Creates a .schemax/ directory with project configuration.
2. Import from a live workspace
schemax import --profile my-databricks --warehouse-id abc123
Brings your existing Unity Catalog hierarchy into a SchemaX project.
3. Make changes and generate SQL
Use the VS Code extension to design visually, or modify the state directly. Then:
# Preview SQL for an environment
schemax sql --target dev
# Save to file
schemax sql --target prod --output migration.sql
4. Deploy
# Dry run
schemax apply --target dev --profile my-databricks --warehouse-id abc123 --dry-run
# Apply
schemax apply --target dev --profile my-databricks --warehouse-id abc123
# With auto-rollback on failure
schemax apply --target prod --profile my-databricks --warehouse-id abc123 --auto-rollback
5. Rollback if needed
# Partial — revert a failed deployment
schemax rollback --partial --deployment <id> --target dev --profile my-databricks --warehouse-id abc123
# Complete — rollback to a snapshot
schemax rollback --to-snapshot v0.2.0 --target dev --profile my-databricks --warehouse-id abc123
CLI reference
| Command | Description |
|---|---|
schemax init |
Initialize a new project |
schemax validate |
Validate project structure and schema |
schemax sql |
Generate SQL migration from changes |
schemax apply |
Deploy to a Databricks environment |
schemax rollback |
Rollback a deployment (partial or complete) |
schemax import |
Import from live Databricks or SQL file |
schemax snapshot create |
Create a versioned snapshot |
schemax snapshot validate |
Detect stale snapshots |
schemax snapshot rebase |
Rebase a stale snapshot |
schemax diff |
Compare two versions with optional SQL preview |
schemax bundle |
Generate Databricks Asset Bundles resource YAML |
schemax record-deployment |
Manually record deployment metadata |
Run schemax <command> --help for detailed options.
CI/CD integration
GitHub Actions
name: Deploy Schema
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install schemaxpy
- run: schemax validate
- name: Apply to production
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
run: |
schemax apply \
--target prod \
--profile default \
--warehouse-id ${{ secrets.WAREHOUSE_ID }} \
--no-interaction \
--auto-rollback
Python API
from pathlib import Path
from schemax.core.storage import load_current_state, read_project, get_environment_config
from schemax.providers.base.operations import Operation
workspace = Path.cwd()
state, changelog, provider = load_current_state(workspace)
# Generate SQL with environment-specific catalog names
project = read_project(workspace)
env_config = get_environment_config(project, "prod")
catalog_mapping = {}
for catalog in state.get("catalogs", []):
logical = str(catalog.get("name"))
physical = env_config.get("catalogMappings", {}).get(logical, logical)
catalog_mapping[logical] = physical
generator = provider.get_sql_generator(state)
generator.catalog_name_mapping = catalog_mapping
operations = [Operation(**op) for op in changelog["ops"]]
print(generator.generate_sql(operations))
Multi-provider roadmap
SchemaX is built on a provider architecture. Unity Catalog is fully supported today (v0.2.x). Lakebase (PostgreSQL) support is in active development for v0.3.x.
| Provider | Status | Hierarchy |
|---|---|---|
| Unity Catalog | Available (v0.2.x) | Catalog → Schema → Table / View / Volume / Function / MV |
| Lakebase (PostgreSQL) | In development (v0.3.x) | Database → Schema → Table |
Requirements
- Python 3.11+
- A SchemaX project (
.schemax/directory) - For deployment: Databricks workspace with SQL Warehouse access
Links
- Documentation — Setup, quickstart, and reference
- VS Code Extension — Visual schema designer
- GitHub Repository — Source code and issues
- PyPI
Apache License 2.0 — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schemaxpy-0.2.11.tar.gz.
File metadata
- Download URL: schemaxpy-0.2.11.tar.gz
- Upload date:
- Size: 512.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8871ffdcc3ae78fd0bc1930de3797a4e5dd72bda4edfe9cd02295035867a083
|
|
| MD5 |
ada9286bb155f0d954036444ed934178
|
|
| BLAKE2b-256 |
7e56fc566867ac38da0fd9de67cab9ac672fdbfbd557c18b4d212f5203948dc3
|
Provenance
The following attestation bundles were made for schemaxpy-0.2.11.tar.gz:
Publisher:
publish-pypi.yml on vb-dbrks/schemax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schemaxpy-0.2.11.tar.gz -
Subject digest:
e8871ffdcc3ae78fd0bc1930de3797a4e5dd72bda4edfe9cd02295035867a083 - Sigstore transparency entry: 1081046911
- Sigstore integration time:
-
Permalink:
vb-dbrks/schemax@e95f85b4e49de1ce529e33df7af842a3aaa2a2cf -
Branch / Tag:
refs/tags/v0.2.11 - Owner: https://github.com/vb-dbrks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@e95f85b4e49de1ce529e33df7af842a3aaa2a2cf -
Trigger Event:
push
-
Statement type:
File details
Details for the file schemaxpy-0.2.11-py3-none-any.whl.
File metadata
- Download URL: schemaxpy-0.2.11-py3-none-any.whl
- Upload date:
- Size: 209.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b53994bc22bb3f000772e39df890ba4645fc1055c8235ee8eea2aaf45a29508e
|
|
| MD5 |
588f196310dc408d2a63ecf322bf847d
|
|
| BLAKE2b-256 |
020881c89565905e89e54cabb54534518368a68c96299d1547f6a15a09b3b4e7
|
Provenance
The following attestation bundles were made for schemaxpy-0.2.11-py3-none-any.whl:
Publisher:
publish-pypi.yml on vb-dbrks/schemax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schemaxpy-0.2.11-py3-none-any.whl -
Subject digest:
b53994bc22bb3f000772e39df890ba4645fc1055c8235ee8eea2aaf45a29508e - Sigstore transparency entry: 1081047007
- Sigstore integration time:
-
Permalink:
vb-dbrks/schemax@e95f85b4e49de1ce529e33df7af842a3aaa2a2cf -
Branch / Tag:
refs/tags/v0.2.11 - Owner: https://github.com/vb-dbrks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@e95f85b4e49de1ce529e33df7af842a3aaa2a2cf -
Trigger Event:
push
-
Statement type: