Skip to main content

Snowflake Data Validation

Project description

Snowflake Data Validation

License Python

Snowflake Data Validation is a command-line tool and Python library for validating data migrations and ensuring data quality between source and target databases, with a focus on Snowflake and SQL Server.

📖 For detailed usage instructions, configuration examples, and CLI reference, please check the official documentation.


🚀 Features

  • Multi-level validation: Schema validation, statistical metrics, and row-level data integrity checks.
  • Multiple source platforms: SQL Server, Redshift, Teradata.
  • User-friendly CLI: Comprehensive commands for automation and orchestration.
  • Parallel processing: Multi-threaded table validation for faster execution.
  • Offline validation: Extract source data as Parquet files for validation without source access.
  • Flexible configuration: YAML-based workflows with per-table customization.
  • Partitioning support: Row and column partitioning helpers for large table validation.
  • Detailed reporting: CSV reports, console output, and comprehensive logging.
  • Extensible architecture: Ready for additional database engines.

📦 Installation

pip install snowflake-data-validation

For SQL Server support:

pip install "snowflake-data-validation[sqlserver]"

For development and testing:

pip install "snowflake-data-validation[all]"

🔄 Execution Modes

Mode Command Description
Sync Validation run-validation Real-time comparison between source and target databases
Source Extraction source-validate Extract source data to Parquet files for offline validation
Async Validation run-async-validation Validate using pre-extracted Parquet files
Script Generation generate-validation-scripts Generate SQL scripts for manual execution

Supported Dialects: sqlserver, snowflake, redshift, teradata


🔍 Validation Levels

Schema Validation

Compares table structure between source and target:

  • Column names and order
  • Data types with mapping support
  • Precision, scale, and length
  • Nullable constraints

Metrics Validation

Compares statistical metrics for each column:

  • Row count
  • Min/Max values
  • Sum and Average
  • Null count
  • Distinct count

Row Validation

Performs row-by-row comparison:

  • Primary key matching
  • Field-level value comparison
  • Mismatch reporting

📊 Reports

  • Console Output: Real-time progress with success/failure indicators
  • CSV Reports: Detailed validation results with all comparison data
  • Log Files: Comprehensive debug and error logging

📚 Documentation

For complete command reference, configuration options, and examples, see the Data Validation CLI.


🤝 Contributing

We welcome contributions! See our Contributing Guide for details on how to collaborate, set up your development environment, and submit PRs.


📄 License

This project is licensed under the Snowflake Conversion Software Terms. See the LICENSE file for the full text or visit the Conversion Software Terms for more information.


🆘 Support


Developed with ❄️ by Snowflake

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowflake_data_validation-1.2.3.tar.gz (279.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snowflake_data_validation-1.2.3-py3-none-any.whl (302.8 kB view details)

Uploaded Python 3

File details

Details for the file snowflake_data_validation-1.2.3.tar.gz.

File metadata

File hashes

Hashes for snowflake_data_validation-1.2.3.tar.gz
Algorithm Hash digest
SHA256 1b8811b80a8495acf6984a4a4153595143ca425357e40c3758d4b6902ac6c23b
MD5 6b78cb54b1a109a46bfd62c13dffc08b
BLAKE2b-256 ba2b495d5a95871d43efddf2ad5b9d3cd035da3a9e2126fd964ce74d18f6d841

See more details on using hashes here.

File details

Details for the file snowflake_data_validation-1.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for snowflake_data_validation-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a12ec35d7f0b4c550b101e39e0951c22e8c1c26653464ef2a9a7ac389069d566
MD5 68fa2d5fad67681594db1b1a40ecc49f
BLAKE2b-256 db8a3a93d748bf45c127a165d6d5fcb078ac8aec8dacb6e26e7f3050faf7fde0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page