Skip to main content

Snowflake Data Validation

Project description

Snowflake Data Validation

License Python

Snowflake Data Validation is a command-line tool and Python library for validating data migrations and ensuring data quality between source and target databases, with support for SQL Server, Redshift, Teradata, and Oracle as source systems.

📖 For detailed usage instructions, configuration examples, and CLI reference, see the official documentation.


🚀 Features

  • Multi-level validation: Schema validation, statistical metrics, and row-level data integrity checks.
  • Multiple source platforms: SQL Server, Redshift, Teradata, Oracle.
  • User-friendly CLI: Comprehensive commands for automation and orchestration.
  • Parallel processing: Multi-threaded table validation for faster execution.
  • Offline validation: Extract source data as Parquet files for validation without source access.
  • Flexible configuration: YAML-based workflows with per-table customization.
  • Partitioning support: Row and column partitioning helpers for large table validation.
  • Detailed reporting: CSV reports, console output, and comprehensive logging.
  • Extensible architecture: Ready for additional database engines.

📦 Installation

pip install snowflake-data-validation

For SQL Server support:

pip install "snowflake-data-validation[sqlserver]"

For development and testing:

pip install "snowflake-data-validation[all]"

🔄 Execution Modes

Mode Command Description
Sync Validation run-validation Real-time comparison between source and target databases
Source Extraction source-validate Extract source data to Parquet files for offline validation
Async Validation run-async-validation Validate using pre-extracted Parquet files
Script Generation generate-validation-scripts Generate SQL scripts for manual execution

Supported Dialects: sqlserver, snowflake, redshift, teradata, oracle

CLI entry point: from the repo (or an install), run the package module — for example:

python -m snowflake.snowflake_data_validation redshift run-validation -dvf path/to/config.yaml

Do not use python -m snowflake.snowflake_data_validation.main_cli for subcommands; that path exits after logging without dispatching to redshift / sqlserver / etc.


🔍 Validation Levels

Validation runs L1 schema, L2 metrics, and optionally L3 row (and cell / hybrid modes where configured). Each level emits per-column (or per-criterion) rows with a STATUS of SUCCESS or FAILURE; the run is considered passing at that level only when no failure rows are present.

Schema Validation

Compares table structure between source and target:

  • Column names and order
  • Data types with mapping support
  • Precision, scale, and length
  • Nullable constraints

Metrics Validation

Compares statistical metrics for each column:

  • Row count
  • Min/Max values
  • Sum and Average
  • Null count
  • Distinct count

Views

For Redshift and Teradata sources, views are validated on the view itself (no CREATE TABLE AS SELECT materialization in local sync mode). Other platforms may still materialize views to temporary tables before L1/L2/L3. Teradata views use basic schema comparison (existence + datatype) in the sync executor; Redshift uses full schema metadata like tables.

Row Validation

Performs row-by-row comparison:

  • Primary key matching
  • Field-level value comparison
  • Mismatch reporting

📊 Reports

  • Console Output: Real-time progress with success/failure indicators
  • CSV Reports: Detailed validation results with all comparison data
  • Log Files: Comprehensive debug and error logging

CLI commands

snowflake-data-validation <source_dialect> <command> [options]
# Short alias: sdv <source_dialect> <command> [options]

Supported dialects include sqlserver, teradata, redshift, and oracle. Common commands: run-validation, run-async-validation, source-validate, and generate-validation-scripts. The dialect must match source_platform in your configuration file.


📚 Documentation

For complete command reference, configuration options, and examples, see the Data Validation CLI.


Changelog

v1.9.2

Bug fixes

  • Fixed L3 row hashing crash when source and target tables had different column counts under existing column-selection or column-mapping config.

v1.9.1

Bug fixes

  • Fixed smart partition size calculation when using targetWhereClause.

v1.9.0

New features

  • Added Azure Synapse connection support (Phase 1).
  • Added Azure Synapse L1 schema validation (Phase 2).
  • Added Azure Synapse L2 metric validation (Phase 3).
  • Added Azure Synapse L3 row validation (Phase 4).
  • Added Azure Synapse CLI and agent integration (Phase 5).
  • Added PostgreSQL bulk extract via psql \copy with PG_CSV_FILE_FORMAT support.
  • Added Oracle ROWID and UROWID type support.
  • Replaced standalone Oracle connector with oracledb-backed agent connection and added an Oracle Docker image.

Improvements

  • Enhanced task telemetry by integrating MigrationTracker and current-step tracking.
  • Improved Oracle fully-qualified-name and view validation handling.
  • Updated VARCHAR limits in validation templates and constants.
  • Improved row-hashing performance.

Bug fixes

  • Fixed result-set normalization rules to handle whitespace correctly.
  • Fixed custom-template model imports.

v1.8.0

Breaking changes

  • DuplicateIndexKeysError has been removed. Cell-by-cell validation no longer raises on duplicate index keys; instead, duplicate-key rows are silently excluded and reported as DuplicateKeyInfo entries.
  • validate_cell now returns tuple[ValidationResult, list[DuplicateKeyInfo]] instead of ValidationResult.
  • CellDataValidator.validate_cell_comparison now returns tuple[pd.DataFrame, list[DuplicateKeyInfo]] instead of pd.DataFrame.
  • compare_md5_rows uses group-by merge: keys appearing more than once on either side emit DUPLICATE_SOURCE/DUPLICATE_TARGET/DUPLICATE_BOTH_SIDES for every row in the group; FAILURE is never emitted for keys with duplicates.
  • Added support for early stopping, hybrid L3, and Snowpipe; existing configurations may need to be updated.

New features

  • Added view validation support to the Cloud Data Validation pipeline.
  • Added DATA_MIGRATION_WORKFLOW and DATA_VALIDATION_WORKFLOW views with workflow-type filtering and per-workflow L1/L2/L3 progress rollups.
  • Added Oracle ODBC source support.
  • Added Teradata type mappings and Snowflake target fully-qualified-name helpers.
  • Added Teradata object_type query in the shared dispatcher.
  • Added Teradata to the data-migration-orchestrator workflow.
  • Added Oracle source platform with ALL_ schema discovery, type mappings, and preprocessing hooks.
  • Added early-stopping support for L3 row-hashing validation.
  • Added support for custom metrics and templates in Cloud Data Validation.
  • Added Oracle Data Validation foundation with L1 schema validation.
  • Added L2 metrics validation for Oracle.
  • Added L3 row and cell MD5 validation for Oracle.
  • Added Oracle wiring and factory registration in the SDV core.
  • Added the teradata optional install extra (pip install snowflake-data-exchange-agent[teradata]).
  • Added PostgreSQL data types and normalization rules.
  • Added Snowflake schema utilities and type-mapping updates for the orchestrator.
  • Added a metrics skill and PostgreSQL metrics templates.
  • Added PostgreSQL connector with L0 and L1 validation.
  • Added Redshift view validation.
  • Added Oracle as a supported Data Validation source across the orchestrator and agent.
  • Added L2 and L3 row and cell validation for PostgreSQL.
  • Added PostgreSQL support to Cloud Data Validation.

Improvements

  • Optimized L3 row-hashing queries.
  • Updated Teradata MD5 queries.
  • Extended Teradata ODBC connection configuration.

Bug fixes

  • Fixed DV CLI not exiting on Ctrl+C by installing SIGINT/SIGTERM handlers.
  • Fixed is_valid semantics so it indicates whether validations were executed rather than success or failure.
  • Prevented out-of-memory errors in cell-by-cell and row-hashing comparisons in workers.
  • Fixed L3 row-hashing producing false positives.
  • Prevented unnecessary shared-cache eviction when the loaded copy already matches the workspace.

v1.7.1

Improvements

  • Improved column metrics query performance by consolidating per-column CTEs into a single wide-row query; wide-table runs on a 24 GB test dropped from about 52 minutes to about 21 minutes.

Bug fixes

  • Fixed Value overflow in a SUM aggregate errors on STDDEV over 64-bit integer columns by casting SUM/AVG/STDDEV inputs to FLOAT on Snowflake, Teradata, and SQL Server; removed the VARIANCE metric from all dialects.

v1.7.0

Improvements

  • Cast value columns to Utf8 before unpivot and corrected IS_VALID evaluation.
  • Vertical partitioning for cell validation on wide tables.
  • Validated table-level ROW_COUNT during schema validation.
  • Table-centric Streamlit dashboards with CSV export.

Bug fixes

  • Fixed timestamp copy handling for SQL Server BCP loads.
  • Fixed Teradata row validation when chunk MD5 comparison was skipped incorrectly.
  • Fixed duplicate tasks created when evaluating L1 results under race conditions.
  • Fixed decimal partition coercion and parallelized L3 validation fixes.

v1.6.0

New features

  • Added hybrid row validation mode — two-phase MD5 + cell drilldown.
  • Added support for smart partitioning in Cloud Data Validation.
  • Added DEFAULT metrics templates and fallback logic for empty metrics templates.
  • Added DEFAULT normalization templates for various data types.

Improvements

  • Improved result set snapshots validation.
  • Added customization options for result set snapshot generation.
  • Improved Data Validation performance.
  • Improved the task queue to support a higher number of parallel workers.

Bug fixes

  • Fixed SQL compilation memory exhaustion by batching L2 metrics queries for wide tables.
  • Fixed cell validation key mismatch, partition WHERE clause handling, and metrics payload cleanup.
  • Fixed duplicate rows caused by an orchestrator crash during COPY INTO.
  • Fixed an issue with the incremental sync watermark on Redshift.
  • Fixed usage of the vectorized scanner.

v1.5.2

Bug fixes

  • Redshift: close cursors correctly after connection verification.
  • Teradata: safer row-count and metadata/metrics SQL (large COUNT(*) values; WHERE clauses applied as intended).

v1.5.1

Improvements

  • Cell validation streams rows in chunks for very large result sets when cloud validation runs on the worker.

📄 License

This project is licensed under the Snowflake Conversion Software Terms.

Conversion Software Terms of Use Revision - August 2025

Last Updated: September 2, 2025 This Conversion Software Terms of Use (“TOU”) govern use of the conversion software provided by or on behalf of Snowflake (the “Conversion Software”) and supersede any previous written or oral agreements and communications relating to the Conversion Software. As used herein, “Snowflake” shall mean the applicable Snowflake entity as set forth under “Contracting Entities” at https://www.snowflake.com/en/legal/; “Company” shall mean the person or entity agreeing to this TOU; and “Snowflake Service” means the generally available software-as-a-service offering hosted by or on behalf of Snowflake. BY CLICKING “ACCEPT” TO THIS TOU, OR USING THE CONVERSION SOFTWARE, YOU ACCEPT AND AGREE TO THE TERMS AND CONDITIONS OF THIS TOU. IF YOU ARE USING THE CONVERSION SOFTWARE AS AN EMPLOYEE, CONTRACTOR, OR AGENT OF A CORPORATION, PARTNERSHIP OR SIMILAR ENTITY, THEN YOU MUST BE AUTHORIZED TO SIGN FOR AND BIND SUCH ENTITY IN ORDER TO ACCEPT THE TERMS OF THIS TOU, AND YOU REPRESENT AND WARRANT THAT YOU HAVE THE AUTHORITY TO DO SO. THE RIGHTS GRANTED UNDER THIS TOU ARE EXPRESSLY CONDITIONED UPON ACCEPTANCE BY SUCH AUTHORIZED PERSONNEL. IF YOU DO NOT AGREE TO THESE TERMS AND CONDITIONS, YOU MAY NOT USE THE CONVERSION SOFTWARE. YOU AGREE THAT THIS TOU IS ENFORCEABLE LIKE ANY WRITTEN AGREEMENT SIGNED BY COMPANY. INTERNAL USE LICENSE. Subject to the terms and conditions of this TOU, Snowflake hereby provides Company a limited, non-exclusive, non-transferable, non-sublicensable, royalty-free license to install and use the Conversion Software in object code form solely for Company’s internal business purposes of converting code and other database objects in support of Company’s migration of data to the Snowflake Service (“Conversion License”). There are no restrictions on the above license related to the amount of code that may be converted by the Conversion Software or the number of Company users to which Company may grant access to the Conversion Software. PARTNER USE. If Company chooses to provide external products or services related to the Conversion Software, Company agrees to additionally comply with the terms and conditions in the attached Partner Services Addendum. COMPANY RESPONSIBILITIES. Company will be responsible for its and any of its users’ use of the Conversion Software and for its and their compliance with the Conversion License and this TOU. Company hereby warrants that it has sufficient rights in and access to any products, services, database objects, code and any other data being used in connection with the Conversion Software. Company will not (and will not permit any third party to): (a) sell, rent, lease, license, distribute, provide access to, sublicense, or otherwise make available the Conversion Software to a third party; (b) use the Conversion Software for any migration other than to the Snowflake Service; (c) reverse engineer, decompile or disassemble the Conversion Software, or seek to obtain its source code or non-public APIs, except to the extent expressly permitted by applicable law (and then only upon advance written notice to Snowflake); (d) modify, correct, adapt, translate, enhance, or otherwise prepare derivative works or improvements of the Conversion Software; (e) circumvent or disable any technological or security features or measures in the Conversion Software; (f) modify or create derivative works of the Conversion Software or incorporate the Conversion Software into any other product; (g) remove or obscure any proprietary or other notices contained in the Conversion Software; or (h) use the Conversion Software to benchmark Snowflake’s products or platforms, to generate competitive analysis or to build a competitive product or service. Company shall comply with applicable laws. EXTERNAL SERVICES CONNECTIVITY. The Conversion Software may interoperate with other software or services (the “External Services”), including but not limited to External Services that use artificial intelligence technology (e.g., the Snowflake Service). Use of such External Services may be subject to policies or agreements in place between Company and the External Services provider(s). Company is solely responsible for ensuring that it complies with and agrees to any policies or agreements that govern its use of or connections to External Services. Use of such External Services may incur charges by the External Service provider(s). Company is solely responsible for payment of any charges incurred by it on an External Service used via the Conversion Software. SNOWFLAKE RIGHTS. Any or all Conversion Licenses granted hereunder may be terminated by Snowflake at any time for any or no reason. Upon such termination or expiration, Company must cease use and delete all copies of the Conversion Software in its possession, custody or control. Except for the limited license granted herein, no rights, title or interest in or to the Conversion Software are granted or transferred to Company. Notwithstanding anything in this TOU to the contrary, Snowflake may update or change this TOU from time to time. Snowflake will provide at least thirty (30) days’ notice to Company via the email address Company used to download the Conversion Software, and any such update to this TOU will become effective at the conclusion of the notice period. Company’s sole and exclusive remedy if it does not agree to any updates or changes to this TOU will be to terminate this TOU and cease using the Conversion Software. Company’s failure to terminate these Terms within the notice period will constitute Company’s agreement to such update or change. SOFTWARE UPDATES AND USAGE DATA. Updates to Conversion Software may be made available from time to time by Snowflake, in its sole discretion. Such updates are considered part of the Conversion Software once installed. Notwithstanding the foregoing, the Conversion Software does not include any support services and is hereby expressly excluded from all support and service level obligations otherwise applicable to the Snowflake Service. Company acknowledges that the Conversion Software may send certain usage data and error logs to Snowflake, and Snowflake may use such usage data and error logs to develop, improve, support, and operate its products and services. NO WARRANTY. THE CONVERSION SOFTWARE IS PROVIDED “AS IS,” AND SNOWFLAKE MAKES NO WARRANTIES, EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, WITH RESPECT TO THE CONVERSION SOFTWARE, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, TITLE, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SNOWFLAKE DOES NOT WARRANT THAT THE USE OF THE CONVERSION SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE, NOR DOES SNOWFLAKE WARRANT ANY LEVEL OF CONVERSION SUCCESS FOR SPECIFIC CODE USED WITH THE CONVERSION SOFTWARE. COMPANY IS RESPONSIBLE FOR TESTING THE CONVERSION SOFTWARE PRIOR TO USE AND DETERMINING ITS SUITABILITY FOR COMPANY’S USE AND PURPOSES. THE CONVERSION SOFTWARE IS NOT A DELIVERABLE NOR PART OF THE SNOWFLAKE SERVICE, AND IT IS EXCLUDED FROM SNOWFLAKE’S SECURITY, SUPPORT, SERVICE LEVEL, COMPLIANCE AND PRIVACY OBLIGATIONS APPLICABLE TO THE SNOWFLAKE SERVICE. LIMITATION OF LIABILITY. SNOWFLAKE’S TOTAL LIABILITY FOR ALL CLAIMS IN THE AGGREGATE (FOR DAMAGES OR LIABILITY OF ANY TYPE) ARISING FROM OR RELATED TO THE CONVERSION SOFTWARE SHALL NOT EXCEED FIFTY U.S. DOLLARS ($50). IN NO EVENT WILL SNOWFLAKE BE LIABLE FOR ANY LOSS OF USE, LOST OR INACCURATE TRANSLATED CODE, INTERRUPTION OF BUSINESS, COSTS OF DELAY, LOST PROFITS, OR ANY INDIRECT, SPECIAL, INCIDENTAL, RELIANCE, PUNITIVE, EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND ARISING FROM OR RELATED TO THE CONVERSION SOFTWARE, EVEN IF INFORMED OF THE POSSIBILITY OF SUCH DAMAGES IN ADVANCE. THE PARTIES AGREE THAT THIS SECTION WILL APPLY REGARDLESS OF THE FORM OF ACTION, WHETHER IN CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE AND WILL APPLY EVEN IF ANY LIMITED REMEDY SPECIFIED IN THIS TOU IS FOUND TO HAVE FAILED OF ITS ESSENTIAL PURPOSE.

PARTNER SERVICES ADDENDUM The terms and conditions in this addendum apply only to the extent Company chooses to provide external products or services related to the Conversion Software. Any such use of the Conversion Software is subject to the TOU and this Partner Services Addendum (the “Addendum”). Notwithstanding anything to the contrary, any conflict between the terms in this Addendum and the rest of the TOU will be resolved in favor of the terms in this Addendum.

  1. PARTNER USE LICENSE. Subject to the terms and conditions of this Addendum, Snowflake hereby provides Company a limited, non-exclusive, non-transferable, non-sublicensable, royalty-free license to install and use the Conversion Software (the “Partner Conversion License”) in object code form solely to facilitate migration of code and other database objects to the Snowflake Service for a third party who has entered into a separate agreement with Company (such third party, an “End User” and services relating to such use, the “Partner Offering”).
  2. LICENSE RESTRICTIONS. In making the Partner Offering available, Company may not distribute the Conversion Software to End Users; instead, End Users must obtain the Conversion Software directly from Snowflake. Further, Company may not charge for access to the Conversion Software itself, but Company may charge End Users for the Partner Offering.
  3. PARTNER OBLIGATIONS & RESTRICTIONS. Company remains solely responsible and liable for its own relationships and agreements with the End User and for the Partner Offering. Company may not: (a) hold itself out as an agent of Snowflake or represent or imply to any third party that services or products offered by Company alongside the Conversion Software are provided by or affiliated in any way with Snowflake; (b) make any legal representations, guarantees or warranties of any type on behalf of Snowflake, or describe the Conversion Software in a manner inconsistent with this Addendum or the descriptions and terms contained in the applicable documentation; (c) obscure or remove any marks associating the Conversion Software with Snowflake; (d) represent or imply to any third party that the Conversion Software is provided by Company; or (e) violate any anti-corruption or other laws or engage in any deceptive, misleading, illegal, or unethical practices.
  4. SNOWFLAKE RIGHTS. Any or all Partner Conversion Licenses granted pursuant to this Addendum may be terminated by Snowflake at any time for any or no reason. Upon such termination or expiration, Company must cease use and delete all copies of the Conversion Software in its possession, custody or control. Except for the limited license granted herein, no rights, title or interest in or to the Conversion Software are granted or transferred to Company.
  5. NO THIRD-PARTY BENEFICIARY. The TOU and Addendum are made solely for the benefit of Snowflake and Company and each of their successors or assigns. No other person, including an End User, shall have any rights, interests, or claims hereunder or be entitled to any benefits under or on account of the TOU or Addendum as a third-party beneficiary or otherwise. Snowflake has no obligation or liability to such third parties for the Partner Offering.
  6. COMPANY INDEMNIFICATION. Company will defend Snowflake from and against any third-party claim arising from or relating to Company’s use of the Conversion Software or any Partner Offering, and will indemnify and hold harmless Snowflake from and against any damages and costs awarded against Snowflake or agreed in settlement by Company (including reasonable attorneys’ fees) resulting from such claim.
  7. INDEMNIFICATION PROCEDURES. In the event of a potential indemnity obligation under Section 6, Snowflake will: (a) promptly notify Company in writing of the claim, (b) allow Company the right to control the investigation, defense and settlement (if applicable) of such claim at Company’s sole cost and expense, and (c) upon request of Company, provide all necessary cooperation at Company’s expense. Failure by Snowflake to notify Company of a claim under this Section shall not relieve Company of its obligations under this Section, however Company shall not be liable for any litigation expenses that Snowflake incurred prior to the time when notice is given or for any damages and/or costs resulting from any material prejudice caused by the delay or failure to provide notice to Company in accordance with this Section. Company may not settle any claim in any matter that would require obligation on the part of Snowflake (other than payment by Company or ceasing to use infringing materials), or any admission of fault by Snowflake, without Snowflake’s prior written consent, such consent not to be unreasonably withheld, conditioned or delayed.

Visit the Conversion Software Terms for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowflake_data_validation-1.9.2.tar.gz (486.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snowflake_data_validation-1.9.2-py3-none-any.whl (613.6 kB view details)

Uploaded Python 3

File details

Details for the file snowflake_data_validation-1.9.2.tar.gz.

File metadata

File hashes

Hashes for snowflake_data_validation-1.9.2.tar.gz
Algorithm Hash digest
SHA256 f8f170c2831e0f47c61914069b357821b1a028e510377b64c2fd41a83669a002
MD5 9d2bcf65696e52f34b762f16fe46e817
BLAKE2b-256 25442a4f842e1f59e7c9e5c178946b9785357c814abb5ad60ea29ce5868c714a

See more details on using hashes here.

File details

Details for the file snowflake_data_validation-1.9.2-py3-none-any.whl.

File metadata

File hashes

Hashes for snowflake_data_validation-1.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 99654bf7a227bb0639993ea21b3a9c119279696a75b668273e10a6c49b4fc017
MD5 2df9a34b59c1bcba02e1c8ff13c62e15
BLAKE2b-256 53b381a5e3037bbae5e3122f1cc4d1a6dc44d5091fc96a05826a65e0852325d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page