A tool for detecting and managing Snowflake schema drift

These details have not been verified by PyPI

Project links

Project description

schemadrift

Detect and manage Snowflake schema drift.

schemadrift keeps Snowflake object definitions in source control in sync with a live Snowflake account. It extracts object definitions, stores them as YAML files, and performs bidirectional comparison to detect drift. The output is a structured changeset that AI agents or other tools can consume to generate SQL DDL scripts.

How it differs from traditional DCM tools

Tools like Flyway, schemachange, Liquibase, and the Snowflake Terraform provider are deployment-first: their primary job is to apply changes from a repository to Snowflake. schemadrift inverts this relationship.

schemadrift is comparison-first. It does not deploy anything. Instead, it:

Extracts the current state of objects from Snowflake into YAML definitions.
Compares those definitions against what is stored in source control.
Produces a structured YAML changeset describing the differences.

The changeset is the handoff point. An AI agent, a CI pipeline, or a human developer can consume it to decide what SQL to generate and when to apply it. schemadrift separates the concern of "what changed" from "how to apply it."

Why YAML, not SQL

Most database change management tools store object definitions as SQL DDL (CREATE OR REPLACE VIEW ...). This is natural but creates real problems:

Parsing SQL is hard. Snowflake DDL includes non-standard clauses (WITH MANAGED ACCESS, DATA_RETENTION_TIME_IN_DAYS, CLUSTER BY, etc.) that generic SQL parsers don't handle well. Every object type needs its own parser, and edge cases accumulate.
Diffing SQL is unreliable. Whitespace, quoting, keyword casing, and clause ordering all vary between what a human writes and what Snowflake returns from GET_DDL(). Meaningful comparison requires normalization that is fragile and object-type-specific.
Round-tripping is lossy. SQL captures some attributes (the query body of a view) but not others (comments, retention policies, managed access). A complete definition requires supplementing DDL with SHOW metadata, then stitching them back together.

schemadrift stores definitions as YAML attribute dictionaries instead. Each file contains the writable attributes of a single object:

# ANALYTICS_DB/REPORTING/views/DAILY_REVENUE.yaml
name: DAILY_REVENUE
query: SELECT date, SUM(amount) AS revenue FROM orders GROUP BY date
columns:
  - name: DATE
  - name: REVENUE
kind: PERMANENT

# ANALYTICS_DB/schemas/REPORTING.yaml
name: REPORTING
kind: PERMANENT
managed_access: false
data_retention_time_in_days: 1

This is simpler to parse, trivial to diff, and captures every attribute in one place.

Alignment with the Snowflake resource model

The field names and structure in these YAML files are not invented by schemadrift. They follow the Snowflake resource model — the same schema used by:

The Snowflake REST API (POST /api/v2/databases/{database}/schemas/{schema}/views)
The Snowflake Python API (ViewResource, SchemaResource, etc.)
The Snowflake CLI (snow object create view --json '{...}')
The DESCRIBE ... AS RESOURCE SQL command

This means a YAML file produced by schemadrift can be converted to JSON and passed directly to snow object create or the REST API without any transformation:

# Convert schemadrift YAML to JSON and deploy via Snowflake CLI
snow object create view \
  --database ANALYTICS_DB --schema REPORTING --replace \
  --json "$(python -c \
    "import yaml, json, sys; print(json.dumps(yaml.safe_load(open(sys.argv[1]))))" \
    repo/ANALYTICS_DB/REPORTING/views/DAILY_REVENUE.yaml)"

By aligning with the resource model, schemadrift avoids inventing its own schema and stays compatible with the broader Snowflake toolchain.

Installation

pip install schemadrift

Requires Python 3.10+.

Quick start

Extract objects from Snowflake

# Extract a single view to the file system
schemadrift extract view MYDB.MYSCHEMA.MYVIEW --output ./repo

# Extract all objects defined in the config file
schemadrift extract --config schemadrift.toml --output ./repo

# Print a view definition to stdout as YAML
schemadrift extract view MYDB.MYSCHEMA.MYVIEW --stdout

Compare source control against Snowflake

# Human-readable summary
schemadrift compare --source ./repo

# Structured YAML changeset to stdout
schemadrift compare --source ./repo --changeset

# Write changeset to a file
schemadrift compare --source ./repo --output changes.yaml

List objects in Snowflake

schemadrift list databases
schemadrift list schemas MYDB
schemadrift list views MYDB.MYSCHEMA

Connection options

schemadrift connects to Snowflake using snowflake-connector-python. Credentials can be supplied via CLI flags, environment variables, or a named connection from connections.toml:

# Named connection
schemadrift extract view MYDB.MYSCHEMA.MYVIEW --output ./repo --connection myconn

# Explicit credentials
schemadrift extract view MYDB.MYSCHEMA.MYVIEW --output ./repo \
  --account myaccount --user myuser --authenticator externalbrowser

Environment variables SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, and SNOWFLAKE_AUTHENTICATOR are also supported.

Configuration

schemadrift uses a TOML configuration file (schemadrift.toml) to define which objects are under management.

# Targets define the scope of objects to manage.

# Manage all schemas in ANALYTICS_DB
[[targets]]
database = "ANALYTICS_DB"
object_types = ["SCHEMA"]

# Manage all objects in ANALYTICS_DB.REPORTING
[[targets]]
database = "ANALYTICS_DB"
schemas = ["REPORTING"]

# Name mapping for multi-environment support.
# Logical names (used in source control) map to physical Snowflake names.
[name_mapping.account]
ANALYTICS_DB = "ANALYTICS_DB_DEV"

[name_mapping.schemas.ANALYTICS_DB]
REPORTING = "REPORTING_V2"

# Per-object-type exclusion rules (merged with built-in defaults).
[object_types.DATABASE]
exclude_names = ["SNOWFLAKE", "SNOWFLAKE_SAMPLE_DATA"]
exclude_prefixes = ["USER$"]

[object_types.SCHEMA]
exclude_names = ["INFORMATION_SCHEMA", "PUBLIC"]

Targets

Each [[targets]] entry defines a scope:

Fields	Behavior
(no database)	Account-level objects only (DATABASE, WAREHOUSE, ROLE)
`database`	The database itself + all schemas + all schema-level objects
`database` + `schemas`	Only schema-level objects within those schemas
`object_types`	Additional filter at any level

Name mapping

The [name_mapping] section translates between logical names (used in source control and file paths) and physical names (the actual object names in Snowflake). This enables a single repository to target multiple environments by swapping only the config file.

The mapping is bidirectional:

Extract: Objects are fetched from Snowflake using physical names, then the returned definitions are translated back to logical names before writing to disk.
Compare: Source files use logical names. When comparing against Snowflake, logical names are resolved to physical names for the query, then results are translated back.

Two levels of mapping are supported:

[name_mapping.account] renames account-level objects (databases, warehouses, roles).
[name_mapping.schemas.<LOGICAL_DB>] renames schemas within a specific logical database.

Leaf object names (views, tables, etc.) are never renamed — only their container names (database and schema) are mapped.

Example: A team maintains a single repo with logical names ANALYTICS_DB and REPORTING. Each environment has its own schemadrift.toml:

# schemadrift.toml (dev environment)
[[targets]]
database = "ANALYTICS_DB"
schemas = ["REPORTING"]

[name_mapping.account]
ANALYTICS_DB = "ANALYTICS_DB_DEV"

[name_mapping.schemas.ANALYTICS_DB]
REPORTING = "REPORTING_DEV"

# schemadrift.toml (production — no mapping needed)
[[targets]]
database = "ANALYTICS_DB"
schemas = ["REPORTING"]

The file system always uses logical names regardless of environment:

repo/ANALYTICS_DB/REPORTING/views/DAILY_REVENUE.yaml

When running against dev, schemadrift resolves ANALYTICS_DB to ANALYTICS_DB_DEV and REPORTING to REPORTING_DEV for all Snowflake queries. The view DAILY_REVENUE keeps its name — only the database and schema are remapped.

The mapping must be bijective (one-to-one). schemadrift raises an error if multiple logical names map to the same physical name.

Built-in exclusions

These objects are excluded by default and do not need to be configured:

DATABASE: SNOWFLAKE, SNOWFLAKE_SAMPLE_DATA, names starting with USER$
SCHEMA: INFORMATION_SCHEMA, PUBLIC

User-defined exclusions in [object_types.*] are merged additively with these defaults.

Filesystem layout

Object definitions are stored in a directory hierarchy that mirrors Snowflake's object hierarchy:

repo/
  databases/
    ANALYTICS_DB.yaml
  ANALYTICS_DB/
    schemas/
      REPORTING.yaml
    REPORTING/
      views/
        DAILY_REVENUE.yaml
        CUSTOMER_SUMMARY.yaml

The convention is:

Scope	Path
Account-level	`<root>/<type>s/<NAME>.yaml`
Database-level	`<root>/<DATABASE>/<type>s/<NAME>.yaml`
Schema-level	`<root>/<DATABASE>/<SCHEMA>/<type>s/<NAME>.yaml`

Object names and directory names are uppercased. The type directory is the lowercase plural of the object type (databases/, schemas/, views/).

External object support

If a directory exists where a YAML file would be (e.g., views/DAILY_REVENUE/ instead of views/DAILY_REVENUE.yaml), schemadrift treats that object as externally managed. This allows other tools, such as the Snowflake CLI (snow), to own specific objects within the same repository. Externally managed objects are reported as EXTERNALLY_MANAGED in comparison results and are skipped during extraction.

This makes it possible to integrate schemadrift into a Snowflake CLI project where some objects are deployed via snow and others are tracked for drift detection.

Comparison and changesets

The compare command performs a bidirectional comparison. For each object in scope, it determines one of:

Status	Meaning
`EQUIVALENT`	Source and Snowflake definitions match
`DIFFERS`	Both exist but definitions don't match (drift detected)
`MISSING_IN_TARGET`	Exists in source control but not in Snowflake
`MISSING_IN_SOURCE`	Exists in Snowflake but not in source control
`EXTERNALLY_MANAGED`	Managed by an external tool (directory detected)

The --changeset flag (or --output) produces a structured YAML changeset containing only actionable entries (everything except EQUIVALENT and EXTERNALLY_MANAGED):

changes:
- status: DIFFERS
  object_type: VIEW
  identifier: ANALYTICS_DB.REPORTING.DAILY_REVENUE
  definition:
    name: DAILY_REVENUE
    query: SELECT date, SUM(amount) AS revenue FROM orders GROUP BY date
  diff:
    modified:
      query:
        current: SELECT date, SUM(amount) AS revenue FROM sales GROUP BY date
        desired: SELECT date, SUM(amount) AS revenue FROM orders GROUP BY date
- status: MISSING_IN_SOURCE
  object_type: VIEW
  identifier: ANALYTICS_DB.REPORTING.TEMP_DEBUG
  definition:
    name: TEMP_DEBUG
    query: SELECT 1

This changeset is designed to be consumed by LLMs, AI agents, or automation pipelines that generate the appropriate SQL DDL.

Python API

The DriftService class is the primary programmatic interface. The CLI is a thin wrapper around it.

from schemadrift.core.service import DriftService
from schemadrift.core.file_manager import FileManager
from schemadrift.connection import SnowflakeConnection

conn = SnowflakeConnection(connection_name="myconn")
fm = FileManager("./repo")
service = DriftService(conn, fm)

# Extract a single object
obj = service.extract_object("VIEW", "MYDB.MYSCHEMA.MYVIEW")

# Extract to file
path = service.extract_to_file("VIEW", "MYDB.MYSCHEMA.MYVIEW")

# Compare a single object
entry = service.compare_object("VIEW", "MYDB.MYSCHEMA.MYVIEW")
print(entry.status)  # ComparisonStatus.EQUIVALENT, DIFFERS, etc.

# Batch comparison via config
from schemadrift.core.config import load_config
config = load_config(Path("schemadrift.toml"))
entries = service.compare_targets(config)

# Serialize as YAML changeset
from schemadrift.core.comparison import to_changeset_yaml
print(to_changeset_yaml(entries))

Architecture

+---------------------------------------------------------------+
|                           CLI                                 |
|                      (typer app)                              |
|              extract / compare / list                         |
+-------------------------------+-------------------------------+
                                |
+-------------------------------v-------------------------------+
|                       DriftService                            |
|                                                               |
|  Orchestrates extraction, comparison, batch operations.       |
|  Coordinates all subsystems. Primary API for integrations.    |
+---------+-----------------+-------------------+---------------+
          |                 |                   |
+---------v-------+ +-------v---------+ +-------v---------------+
| PluginDispatcher| |   FileManager   | | SnowflakeConnection   |
|                 | |                 | | Interface (Protocol)   |
| Routes calls by | | Reads/writes    | |                       |
| object type to  | | YAML/SQL files  | | execute()             |
| the correct     | | using           | | execute_scalar()      |
| plugin          | | FileStructure   | | get_current_database()|
+---------+-------+ | conventions     | | close()               |
          |         +-----------------+ +-----------+-----------+
+---------v-----------------------------------------+           |
|                  Object Model Layer               |           |
|                                                   |           |
|  SnowflakeObject (ABC)                            |           |
|    +-- Database  (ACCOUNT scope)                  |           |
|    +-- Schema    (DATABASE scope)                 |           |
|    +-- View      (SCHEMA scope)                   |           |
|                                                   |           |
|  ObjectScope: ORGANIZATION > ACCOUNT > DATABASE   |           |
|               > SCHEMA                            |           |
|                                                   |           |
|  ObjectDiff: added / removed / modified           |           |
|  DiffStrategy: pluggable comparison logic         |           |
+---------------------------------------------------+           |
                                                                |
+---------------------------------------------------------------+
|                  snowflake-connector-python                    |
+---------------------------------------------------------------+

Key classes

Class	Module	Role
`DriftService`	`core.service`	Top-level orchestrator. Coordinates plugins, file manager, and connection.
`PluginDispatcher`	`plugins.manager`	Routes operations to the correct plugin by object type.
`FileManager`	`core.file_manager`	Reads and writes object definition files. Uses `FileStructure` for path conventions.
`SnowflakeConnectionInterface`	`connection.interface`	Protocol that any connection implementation must satisfy.
`SnowflakeObject`	`core.base_object`	Abstract dataclass base for all object types. Provides generic `extract()`, `to_dict()`, `from_dict()`, and `compare()`.
`ObjectRenamer`	`core.config`	Bidirectional logical-to-physical name translation for multi-environment support.
`ComparisonEntry`	`core.comparison`	Result of comparing a single object. Carries status, definition, and diff.
`HookAdapter`	`plugins.hook_adapter`	Bridges any `SnowflakeObject` subclass to pluggy hooks without per-plugin boilerplate.
`DiffStrategy`	`core.diff_strategy`	Strategy pattern for object comparison. Supports custom normalizers (e.g., SQL whitespace normalization).

Object extraction methods

schemadrift supports multiple methods for extracting object definitions from Snowflake:

Method	Used by	Description
`DESCRIBE AS RESOURCE`	Schema, Database	Returns a JSON object with all attributes in a single query. Used by the generic `SnowflakeObject.extract()`.
`SHOW AS RESOURCE TERSE`	All (listing)	Lists objects within a scope, returning JSON rows. Used by `SnowflakeObject.list_objects()`.
Custom SQL	View	Overrides `extract()` for objects that need post-processing (e.g., stripping DDL wrappers from view queries).

Most object types can use the generic extraction from SnowflakeObject without any custom code. The base class introspects its own dataclass fields, calls DESCRIBE AS RESOURCE, filters out read-only fields, and constructs the instance.

Plugin system

schemadrift uses pluggy for its plugin architecture. Each object type (DATABASE, SCHEMA, VIEW) is implemented as a plugin.

Hook specification

Every plugin must implement the hooks defined in hookspecs.SnowflakeObjectSpec:

Hook	Returns	Purpose
`get_object_type()`	`str`	Object type name (e.g., `"VIEW"`)
`get_scope()`	`ObjectScope`	Scope level (ACCOUNT, DATABASE, SCHEMA)
`extract_object(connection, identifier)`	`SnowflakeObject`	Extract from Snowflake
`list_objects(connection, scope)`	`list[str]`	List objects in a scope
`object_from_dict(data, context)`	`SnowflakeObject`	Deserialize from dict
`generate_dict(obj)`	`dict`	Serialize to dict
`compare_objects(source, target)`	`ObjectDiff`	Compare two objects

Writing a plugin

Most plugins require only a model class and a one-line registration. The HookAdapter eliminates per-plugin boilerplate by bridging any SnowflakeObject subclass to the pluggy hooks.

1. Define the model:

# my_plugin/model.py
from dataclasses import dataclass
from typing import ClassVar
from schemadrift.core.base_object import ObjectScope, SnowflakeObject

@dataclass
class Table(SnowflakeObject):
    OBJECT_TYPE: ClassVar[str] = "TABLE"
    SCOPE: ClassVar[ObjectScope] = ObjectScope.SCHEMA

    name: str
    database_name: str = ""
    schema_name: str = ""
    kind: str | None = "PERMANENT"
    cluster_by: list[str] | None = None
    comment: str | None = None

2. Register the plugin:

# my_plugin/__init__.py
from my_plugin.model import Table
from schemadrift.plugins.hook_adapter import HookAdapter

plugin = HookAdapter(Table)

3. Declare the entry point in pyproject.toml:

[project.entry-points.schemadrift]
table = "my_plugin"

The generic extract(), to_dict(), from_dict(), and compare() methods on SnowflakeObject use field introspection, so most plugins need zero custom serialization code. Override extract() only when the Snowflake response requires special mapping (as the View plugin does to strip DDL wrappers).

Built-in plugins

Plugin	Scope	Module
DATABASE	ACCOUNT	`schemadrift.plugins.builtin.database`
SCHEMA	DATABASE	`schemadrift.plugins.builtin.schema`
VIEW	SCHEMA	`schemadrift.plugins.builtin.view`

External plugins are discovered automatically via setuptools entry points under the schemadrift group.

Development

# Clone and install in editable mode
git clone https://github.com/your-org/schemadrift.git
cd schemadrift
pip install -e ".[dev]"

# Run all tests
pytest

# Run a specific plugin's tests
pytest src/schemadrift/plugins/builtin/view/tests.py -v

# Lint and format
ruff check src/
black src/
mypy src/

Project structure

src/schemadrift/
  __init__.py
  hookspecs.py                 # Pluggy hook specifications
  cli/
    main.py                    # Typer CLI (extract, compare, list)
  core/
    service.py                 # DriftService (main orchestrator)
    config.py                  # TOML config loader, ObjectRenamer
    base_object.py             # SnowflakeObject ABC, ObjectScope, ObjectDiff
    file_manager.py            # FileManager, FileStructure
    comparison.py              # ComparisonEntry, changeset serialization
    diff_strategy.py           # DiffStrategy, DefaultDiffStrategy
    yaml_utils.py              # YAML serialization helpers
    resource_metadata.py       # Generated read-only field metadata
  connection/
    interface.py               # SnowflakeConnectionInterface (Protocol)
    snowflake_impl.py          # Concrete implementation + MockConnection
  plugins/
    manager.py                 # PluginDispatcher, plugin manager setup
    hook_adapter.py            # Generic HookAdapter
    builtin/
      database/                # DATABASE plugin (model + tests)
      schema/                  # SCHEMA plugin (model + tests)
      view/                    # VIEW plugin (model + tests)

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schemadrift-0.1.0.tar.gz (69.5 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schemadrift-0.1.0-py3-none-any.whl (65.8 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file schemadrift-0.1.0.tar.gz.

File metadata

Download URL: schemadrift-0.1.0.tar.gz
Upload date: Mar 3, 2026
Size: 69.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for schemadrift-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`394b69d33f32b53df020b8f3269d48fa9b5bd7bc488b3828e360e1150cf2b064`
MD5	`4ec8847559105716ec53560eed1fa8d0`
BLAKE2b-256	`cb39a75fd39fcac3580734207a447000f1b38c7341c2e31570757ac7bfbdb6b8`

See more details on using hashes here.

File details

Details for the file schemadrift-0.1.0-py3-none-any.whl.

File metadata

Download URL: schemadrift-0.1.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 65.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for schemadrift-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9df126493491e2f11b4f8b83854c0fb6c6037e54eb8bfdf65123210454dbe205`
MD5	`74891edee964533bc2ec015043aa872e`
BLAKE2b-256	`cc6430508d1d9bbd54fac0f28e5a3c983d88e7d8985d170de0db851f1d170566`

See more details on using hashes here.

schemadrift 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

schemadrift

How it differs from traditional DCM tools

Why YAML, not SQL

Alignment with the Snowflake resource model

Installation

Quick start

Extract objects from Snowflake

Compare source control against Snowflake

List objects in Snowflake

Connection options

Configuration

Targets

Name mapping

Built-in exclusions

Filesystem layout

External object support

Comparison and changesets

Python API

Architecture

Key classes

Object extraction methods

Plugin system

Hook specification

Writing a plugin

Built-in plugins

Development

Project structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes