Snowflake Data Validation
Project description
Snowflake Data Validation
Snowflake Data Validation is a command-line tool and Python library for validating data migrations and ensuring data quality between source and target databases, with support for SQL Server, Redshift, Teradata, and Oracle as source systems.
📖 For detailed usage instructions, configuration examples, and CLI reference, see the official documentation.
🚀 Features
- Multi-level validation: Schema validation, statistical metrics, and row-level data integrity checks.
- Multiple source platforms: SQL Server, Redshift, Teradata, Oracle.
- User-friendly CLI: Comprehensive commands for automation and orchestration.
- Parallel processing: Multi-threaded table validation for faster execution.
- Offline validation: Extract source data as Parquet files for validation without source access.
- Flexible configuration: YAML-based workflows with per-table customization.
- Partitioning support: Row and column partitioning helpers for large table validation.
- Detailed reporting: CSV reports, console output, and comprehensive logging.
- Extensible architecture: Ready for additional database engines.
📦 Installation
pip install snowflake-data-validation
For SQL Server support:
pip install "snowflake-data-validation[sqlserver]"
For development and testing:
pip install "snowflake-data-validation[all]"
🔄 Execution Modes
| Mode | Command | Description |
|---|---|---|
| Sync Validation | run-validation |
Real-time comparison between source and target databases |
| Source Extraction | source-validate |
Extract source data to Parquet files for offline validation |
| Async Validation | run-async-validation |
Validate using pre-extracted Parquet files |
| Script Generation | generate-validation-scripts |
Generate SQL scripts for manual execution |
Supported Dialects: sqlserver, snowflake, redshift, teradata, oracle
CLI entry point: from the repo (or an install), run the package module — for example:
python -m snowflake.snowflake_data_validation redshift run-validation -dvf path/to/config.yaml
Do not use python -m snowflake.snowflake_data_validation.main_cli for subcommands; that path exits after logging without dispatching to redshift / sqlserver / etc.
🔍 Validation Levels
Validation runs L1 schema, L2 metrics, and optionally L3 row (and cell / hybrid modes where configured). Each level emits per-column (or per-criterion) rows with a STATUS of SUCCESS or FAILURE; the run is considered passing at that level only when no failure rows are present.
Schema Validation
Compares table structure between source and target:
- Column names and order
- Data types with mapping support
- Precision, scale, and length
- Nullable constraints
Metrics Validation
Compares statistical metrics for each column:
- Row count
- Min/Max values
- Sum and Average
- Null count
- Distinct count
Views
For Redshift and Teradata sources, views are validated on the view itself (no CREATE TABLE AS SELECT materialization in local sync mode). Other platforms may still materialize views to temporary tables before L1/L2/L3. Teradata views use basic schema comparison (existence + datatype) in the sync executor; Redshift uses full schema metadata like tables.
Row Validation
Performs row-by-row comparison:
- Primary key matching
- Field-level value comparison
- Mismatch reporting
📊 Reports
- Console Output: Real-time progress with success/failure indicators
- CSV Reports: Detailed validation results with all comparison data
- Log Files: Comprehensive debug and error logging
CLI commands
snowflake-data-validation <source_dialect> <command> [options]
# Short alias: sdv <source_dialect> <command> [options]
Supported dialects include sqlserver, teradata, redshift, and oracle. Common commands: run-validation, run-async-validation, source-validate, and generate-validation-scripts. The dialect must match source_platform in your configuration file.
📚 Documentation
For complete command reference, configuration options, and examples, see the Data Validation CLI.
Changelog
v1.20.2
New features
- Added ODBC data migration support for Azure Synapse serverless SQL pool.
Improvements
- Improved L3 row validation with a safeguard for mismatched target columns.
Bug fixes
- Fixed identifier quoting: uppercase identifiers remain unquoted while lowercase or mixed-case identifiers are auto-quoted.
v1.20.1
New features
- Added ODBC-based data migration support for Azure Synapse Analytics (Dedicated SQL pool), including object-type catalog queries.
Bug fixes
- Fixed datatype normalization templates that caused incorrect row index handling during cell-by-cell validation.
v1.20.0
New features
- Added
doctorsubcommand for the data exchange agent to supportscai data doctor. - Added
doctorsubcommand for the data migration orchestrator to supportscai data doctor. - Added support for Oracle quoted and mixed-case identifiers in catalog and object-type queries.
Improvements
- Improved L3 row validation by skipping per-partition
information_schemaqueries.
Bug fixes
- Fixed PostgreSQL special datatype extraction and loading in Snowflake.
v1.9.2
Bug fixes
- Fixed L3 row hashing crash when source and target tables had different column counts under existing column-selection or column-mapping config.
v1.9.1
Bug fixes
- Fixed smart partition size calculation when using
targetWhereClause.
v1.9.0
New features
- Added Azure Synapse connection support (Phase 1).
- Added Azure Synapse L1 schema validation (Phase 2).
- Added Azure Synapse L2 metric validation (Phase 3).
- Added Azure Synapse L3 row validation (Phase 4).
- Added Azure Synapse CLI and agent integration (Phase 5).
- Added PostgreSQL bulk extract via
psql \copywithPG_CSV_FILE_FORMATsupport. - Added Oracle
ROWIDandUROWIDtype support. - Replaced standalone Oracle connector with
oracledb-backed agent connection and added an Oracle Docker image.
Improvements
- Enhanced task telemetry by integrating
MigrationTrackerand current-step tracking. - Improved Oracle fully-qualified-name and view validation handling.
- Updated
VARCHARlimits in validation templates and constants. - Improved row-hashing performance.
Bug fixes
- Fixed result-set normalization rules to handle whitespace correctly.
- Fixed custom-template model imports.
v1.8.0
Breaking changes
DuplicateIndexKeysErrorhas been removed. Cell-by-cell validation no longer raises on duplicate index keys; instead, duplicate-key rows are silently excluded and reported asDuplicateKeyInfoentries.validate_cellnow returnstuple[ValidationResult, list[DuplicateKeyInfo]]instead ofValidationResult.CellDataValidator.validate_cell_comparisonnow returnstuple[pd.DataFrame, list[DuplicateKeyInfo]]instead ofpd.DataFrame.compare_md5_rowsuses group-by merge: keys appearing more than once on either side emitDUPLICATE_SOURCE/DUPLICATE_TARGET/DUPLICATE_BOTH_SIDESfor every row in the group;FAILUREis never emitted for keys with duplicates.- Added support for early stopping, hybrid L3, and Snowpipe; existing configurations may need to be updated.
New features
- Added view validation support to the Cloud Data Validation pipeline.
- Added
DATA_MIGRATION_WORKFLOWandDATA_VALIDATION_WORKFLOWviews with workflow-type filtering and per-workflow L1/L2/L3 progress rollups. - Added Oracle ODBC source support.
- Added Teradata type mappings and Snowflake target fully-qualified-name helpers.
- Added Teradata
object_typequery in the shared dispatcher. - Added Teradata to the data-migration-orchestrator workflow.
- Added Oracle source platform with
ALL_schema discovery, type mappings, and preprocessing hooks. - Added early-stopping support for L3 row-hashing validation.
- Added support for custom metrics and templates in Cloud Data Validation.
- Added Oracle Data Validation foundation with L1 schema validation.
- Added L2 metrics validation for Oracle.
- Added L3 row and cell MD5 validation for Oracle.
- Added Oracle wiring and factory registration in the SDV core.
- Added the
teradataoptional install extra (pip install snowflake-data-exchange-agent[teradata]). - Added PostgreSQL data types and normalization rules.
- Added Snowflake schema utilities and type-mapping updates for the orchestrator.
- Added a metrics skill and PostgreSQL metrics templates.
- Added PostgreSQL connector with L0 and L1 validation.
- Added Redshift view validation.
- Added Oracle as a supported Data Validation source across the orchestrator and agent.
- Added L2 and L3 row and cell validation for PostgreSQL.
- Added PostgreSQL support to Cloud Data Validation.
Improvements
- Optimized L3 row-hashing queries.
- Updated Teradata MD5 queries.
- Extended Teradata ODBC connection configuration.
Bug fixes
- Fixed DV CLI not exiting on Ctrl+C by installing SIGINT/SIGTERM handlers.
- Fixed
is_validsemantics so it indicates whether validations were executed rather than success or failure. - Prevented out-of-memory errors in cell-by-cell and row-hashing comparisons in workers.
- Fixed L3 row-hashing producing false positives.
- Prevented unnecessary shared-cache eviction when the loaded copy already matches the workspace.
v1.7.1
Improvements
- Improved column metrics query performance by consolidating per-column CTEs into a single wide-row query; wide-table runs on a 24 GB test dropped from about 52 minutes to about 21 minutes.
Bug fixes
- Fixed
Value overflow in a SUM aggregateerrors onSTDDEVover 64-bit integer columns by castingSUM/AVG/STDDEVinputs toFLOATon Snowflake, Teradata, and SQL Server; removed theVARIANCEmetric from all dialects.
v1.7.0
Improvements
- Cast value columns to
Utf8before unpivot and correctedIS_VALIDevaluation. - Vertical partitioning for cell validation on wide tables.
- Validated table-level
ROW_COUNTduring schema validation. - Table-centric Streamlit dashboards with CSV export.
Bug fixes
- Fixed timestamp copy handling for SQL Server BCP loads.
- Fixed Teradata row validation when chunk MD5 comparison was skipped incorrectly.
- Fixed duplicate tasks created when evaluating L1 results under race conditions.
- Fixed decimal partition coercion and parallelized L3 validation fixes.
v1.6.0
New features
- Added hybrid row validation mode — two-phase
MD5+ cell drilldown. - Added support for smart partitioning in Cloud Data Validation.
- Added
DEFAULTmetrics templates and fallback logic for empty metrics templates. - Added
DEFAULTnormalization templates for various data types.
Improvements
- Improved result set snapshots validation.
- Added customization options for result set snapshot generation.
- Improved Data Validation performance.
- Improved the task queue to support a higher number of parallel workers.
Bug fixes
- Fixed
SQLcompilation memory exhaustion by batching L2 metrics queries for wide tables. - Fixed cell validation key mismatch, partition
WHEREclause handling, and metrics payload cleanup. - Fixed duplicate rows caused by an orchestrator crash during
COPY INTO. - Fixed an issue with the incremental sync watermark on Redshift.
- Fixed usage of the vectorized scanner.
v1.5.2
Bug fixes
- Redshift: close cursors correctly after connection verification.
- Teradata: safer row-count and metadata/metrics SQL (large
COUNT(*)values;WHEREclauses applied as intended).
v1.5.1
Improvements
- Cell validation streams rows in chunks for very large result sets when cloud validation runs on the worker.
📄 License
This project is licensed under the Snowflake Conversion Software Terms.
Conversion Software Terms of Use Revision - August 2025
Last Updated: September 2, 2025 This Conversion Software Terms of Use (“TOU”) govern use of the conversion software provided by or on behalf of Snowflake (the “Conversion Software”) and supersede any previous written or oral agreements and communications relating to the Conversion Software. As used herein, “Snowflake” shall mean the applicable Snowflake entity as set forth under “Contracting Entities” at https://www.snowflake.com/en/legal/; “Company” shall mean the person or entity agreeing to this TOU; and “Snowflake Service” means the generally available software-as-a-service offering hosted by or on behalf of Snowflake. BY CLICKING “ACCEPT” TO THIS TOU, OR USING THE CONVERSION SOFTWARE, YOU ACCEPT AND AGREE TO THE TERMS AND CONDITIONS OF THIS TOU. IF YOU ARE USING THE CONVERSION SOFTWARE AS AN EMPLOYEE, CONTRACTOR, OR AGENT OF A CORPORATION, PARTNERSHIP OR SIMILAR ENTITY, THEN YOU MUST BE AUTHORIZED TO SIGN FOR AND BIND SUCH ENTITY IN ORDER TO ACCEPT THE TERMS OF THIS TOU, AND YOU REPRESENT AND WARRANT THAT YOU HAVE THE AUTHORITY TO DO SO. THE RIGHTS GRANTED UNDER THIS TOU ARE EXPRESSLY CONDITIONED UPON ACCEPTANCE BY SUCH AUTHORIZED PERSONNEL. IF YOU DO NOT AGREE TO THESE TERMS AND CONDITIONS, YOU MAY NOT USE THE CONVERSION SOFTWARE. YOU AGREE THAT THIS TOU IS ENFORCEABLE LIKE ANY WRITTEN AGREEMENT SIGNED BY COMPANY. INTERNAL USE LICENSE. Subject to the terms and conditions of this TOU, Snowflake hereby provides Company a limited, non-exclusive, non-transferable, non-sublicensable, royalty-free license to install and use the Conversion Software in object code form solely for Company’s internal business purposes of converting code and other database objects in support of Company’s migration of data to the Snowflake Service (“Conversion License”). There are no restrictions on the above license related to the amount of code that may be converted by the Conversion Software or the number of Company users to which Company may grant access to the Conversion Software. PARTNER USE. If Company chooses to provide external products or services related to the Conversion Software, Company agrees to additionally comply with the terms and conditions in the attached Partner Services Addendum. COMPANY RESPONSIBILITIES. Company will be responsible for its and any of its users’ use of the Conversion Software and for its and their compliance with the Conversion License and this TOU. Company hereby warrants that it has sufficient rights in and access to any products, services, database objects, code and any other data being used in connection with the Conversion Software. Company will not (and will not permit any third party to): (a) sell, rent, lease, license, distribute, provide access to, sublicense, or otherwise make available the Conversion Software to a third party; (b) use the Conversion Software for any migration other than to the Snowflake Service; (c) reverse engineer, decompile or disassemble the Conversion Software, or seek to obtain its source code or non-public APIs, except to the extent expressly permitted by applicable law (and then only upon advance written notice to Snowflake); (d) modify, correct, adapt, translate, enhance, or otherwise prepare derivative works or improvements of the Conversion Software; (e) circumvent or disable any technological or security features or measures in the Conversion Software; (f) modify or create derivative works of the Conversion Software or incorporate the Conversion Software into any other product; (g) remove or obscure any proprietary or other notices contained in the Conversion Software; or (h) use the Conversion Software to benchmark Snowflake’s products or platforms, to generate competitive analysis or to build a competitive product or service. Company shall comply with applicable laws. EXTERNAL SERVICES CONNECTIVITY. The Conversion Software may interoperate with other software or services (the “External Services”), including but not limited to External Services that use artificial intelligence technology (e.g., the Snowflake Service). Use of such External Services may be subject to policies or agreements in place between Company and the External Services provider(s). Company is solely responsible for ensuring that it complies with and agrees to any policies or agreements that govern its use of or connections to External Services. Use of such External Services may incur charges by the External Service provider(s). Company is solely responsible for payment of any charges incurred by it on an External Service used via the Conversion Software. SNOWFLAKE RIGHTS. Any or all Conversion Licenses granted hereunder may be terminated by Snowflake at any time for any or no reason. Upon such termination or expiration, Company must cease use and delete all copies of the Conversion Software in its possession, custody or control. Except for the limited license granted herein, no rights, title or interest in or to the Conversion Software are granted or transferred to Company. Notwithstanding anything in this TOU to the contrary, Snowflake may update or change this TOU from time to time. Snowflake will provide at least thirty (30) days’ notice to Company via the email address Company used to download the Conversion Software, and any such update to this TOU will become effective at the conclusion of the notice period. Company’s sole and exclusive remedy if it does not agree to any updates or changes to this TOU will be to terminate this TOU and cease using the Conversion Software. Company’s failure to terminate these Terms within the notice period will constitute Company’s agreement to such update or change. SOFTWARE UPDATES AND USAGE DATA. Updates to Conversion Software may be made available from time to time by Snowflake, in its sole discretion. Such updates are considered part of the Conversion Software once installed. Notwithstanding the foregoing, the Conversion Software does not include any support services and is hereby expressly excluded from all support and service level obligations otherwise applicable to the Snowflake Service. Company acknowledges that the Conversion Software may send certain usage data and error logs to Snowflake, and Snowflake may use such usage data and error logs to develop, improve, support, and operate its products and services. NO WARRANTY. THE CONVERSION SOFTWARE IS PROVIDED “AS IS,” AND SNOWFLAKE MAKES NO WARRANTIES, EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, WITH RESPECT TO THE CONVERSION SOFTWARE, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, TITLE, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SNOWFLAKE DOES NOT WARRANT THAT THE USE OF THE CONVERSION SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE, NOR DOES SNOWFLAKE WARRANT ANY LEVEL OF CONVERSION SUCCESS FOR SPECIFIC CODE USED WITH THE CONVERSION SOFTWARE. COMPANY IS RESPONSIBLE FOR TESTING THE CONVERSION SOFTWARE PRIOR TO USE AND DETERMINING ITS SUITABILITY FOR COMPANY’S USE AND PURPOSES. THE CONVERSION SOFTWARE IS NOT A DELIVERABLE NOR PART OF THE SNOWFLAKE SERVICE, AND IT IS EXCLUDED FROM SNOWFLAKE’S SECURITY, SUPPORT, SERVICE LEVEL, COMPLIANCE AND PRIVACY OBLIGATIONS APPLICABLE TO THE SNOWFLAKE SERVICE. LIMITATION OF LIABILITY. SNOWFLAKE’S TOTAL LIABILITY FOR ALL CLAIMS IN THE AGGREGATE (FOR DAMAGES OR LIABILITY OF ANY TYPE) ARISING FROM OR RELATED TO THE CONVERSION SOFTWARE SHALL NOT EXCEED FIFTY U.S. DOLLARS ($50). IN NO EVENT WILL SNOWFLAKE BE LIABLE FOR ANY LOSS OF USE, LOST OR INACCURATE TRANSLATED CODE, INTERRUPTION OF BUSINESS, COSTS OF DELAY, LOST PROFITS, OR ANY INDIRECT, SPECIAL, INCIDENTAL, RELIANCE, PUNITIVE, EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND ARISING FROM OR RELATED TO THE CONVERSION SOFTWARE, EVEN IF INFORMED OF THE POSSIBILITY OF SUCH DAMAGES IN ADVANCE. THE PARTIES AGREE THAT THIS SECTION WILL APPLY REGARDLESS OF THE FORM OF ACTION, WHETHER IN CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE AND WILL APPLY EVEN IF ANY LIMITED REMEDY SPECIFIED IN THIS TOU IS FOUND TO HAVE FAILED OF ITS ESSENTIAL PURPOSE.
PARTNER SERVICES ADDENDUM The terms and conditions in this addendum apply only to the extent Company chooses to provide external products or services related to the Conversion Software. Any such use of the Conversion Software is subject to the TOU and this Partner Services Addendum (the “Addendum”). Notwithstanding anything to the contrary, any conflict between the terms in this Addendum and the rest of the TOU will be resolved in favor of the terms in this Addendum.
- PARTNER USE LICENSE. Subject to the terms and conditions of this Addendum, Snowflake hereby provides Company a limited, non-exclusive, non-transferable, non-sublicensable, royalty-free license to install and use the Conversion Software (the “Partner Conversion License”) in object code form solely to facilitate migration of code and other database objects to the Snowflake Service for a third party who has entered into a separate agreement with Company (such third party, an “End User” and services relating to such use, the “Partner Offering”).
- LICENSE RESTRICTIONS. In making the Partner Offering available, Company may not distribute the Conversion Software to End Users; instead, End Users must obtain the Conversion Software directly from Snowflake. Further, Company may not charge for access to the Conversion Software itself, but Company may charge End Users for the Partner Offering.
- PARTNER OBLIGATIONS & RESTRICTIONS. Company remains solely responsible and liable for its own relationships and agreements with the End User and for the Partner Offering. Company may not: (a) hold itself out as an agent of Snowflake or represent or imply to any third party that services or products offered by Company alongside the Conversion Software are provided by or affiliated in any way with Snowflake; (b) make any legal representations, guarantees or warranties of any type on behalf of Snowflake, or describe the Conversion Software in a manner inconsistent with this Addendum or the descriptions and terms contained in the applicable documentation; (c) obscure or remove any marks associating the Conversion Software with Snowflake; (d) represent or imply to any third party that the Conversion Software is provided by Company; or (e) violate any anti-corruption or other laws or engage in any deceptive, misleading, illegal, or unethical practices.
- SNOWFLAKE RIGHTS. Any or all Partner Conversion Licenses granted pursuant to this Addendum may be terminated by Snowflake at any time for any or no reason. Upon such termination or expiration, Company must cease use and delete all copies of the Conversion Software in its possession, custody or control. Except for the limited license granted herein, no rights, title or interest in or to the Conversion Software are granted or transferred to Company.
- NO THIRD-PARTY BENEFICIARY. The TOU and Addendum are made solely for the benefit of Snowflake and Company and each of their successors or assigns. No other person, including an End User, shall have any rights, interests, or claims hereunder or be entitled to any benefits under or on account of the TOU or Addendum as a third-party beneficiary or otherwise. Snowflake has no obligation or liability to such third parties for the Partner Offering.
- COMPANY INDEMNIFICATION. Company will defend Snowflake from and against any third-party claim arising from or relating to Company’s use of the Conversion Software or any Partner Offering, and will indemnify and hold harmless Snowflake from and against any damages and costs awarded against Snowflake or agreed in settlement by Company (including reasonable attorneys’ fees) resulting from such claim.
- INDEMNIFICATION PROCEDURES. In the event of a potential indemnity obligation under Section 6, Snowflake will: (a) promptly notify Company in writing of the claim, (b) allow Company the right to control the investigation, defense and settlement (if applicable) of such claim at Company’s sole cost and expense, and (c) upon request of Company, provide all necessary cooperation at Company’s expense. Failure by Snowflake to notify Company of a claim under this Section shall not relieve Company of its obligations under this Section, however Company shall not be liable for any litigation expenses that Snowflake incurred prior to the time when notice is given or for any damages and/or costs resulting from any material prejudice caused by the delay or failure to provide notice to Company in accordance with this Section. Company may not settle any claim in any matter that would require obligation on the part of Snowflake (other than payment by Company or ceasing to use infringing materials), or any admission of fault by Snowflake, without Snowflake’s prior written consent, such consent not to be unreasonably withheld, conditioned or delayed.
Visit the Conversion Software Terms for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snowflake_data_validation-1.20.2.tar.gz.
File metadata
- Download URL: snowflake_data_validation-1.20.2.tar.gz
- Upload date:
- Size: 487.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b17566f4223a90241c55f734f2bae9eb58a4a42823e225ed8e8f45ba9f4bcff
|
|
| MD5 |
246afbe43c631223b2d30ccdf2f4515e
|
|
| BLAKE2b-256 |
3cbd8a4c9c2e9f4f363a846b6d27e49b2c40e4b52a8e3839403f6efea5885396
|
File details
Details for the file snowflake_data_validation-1.20.2-py3-none-any.whl.
File metadata
- Download URL: snowflake_data_validation-1.20.2-py3-none-any.whl
- Upload date:
- Size: 619.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea70c1570ea004330aa03efd171d2525b2a4b21ce80200396b6457e15166c95f
|
|
| MD5 |
dab6ec9c83855fedda9eea82b09dfb94
|
|
| BLAKE2b-256 |
05d69b235ebfd5d704b08d714be73b83aa73933175616a3407e610ced65ee1e9
|