Snowflake Data Validation
Project description
Snowflake Data Validation
Snowflake Data Validation is a command-line tool and Python library for validating data migrations and ensuring data quality between source and target databases, with a focus on Snowflake and SQL Server.
📖 For detailed usage instructions, configuration examples, and CLI reference, please check the official documentation.
🚀 Features
- Multi-level validation: Schema validation, statistical metrics, and row-level data integrity checks.
- Multiple source platforms: SQL Server, Redshift, Teradata.
- User-friendly CLI: Comprehensive commands for automation and orchestration.
- Parallel processing: Multi-threaded table validation for faster execution.
- Offline validation: Extract source data as Parquet files for validation without source access.
- Flexible configuration: YAML-based workflows with per-table customization.
- Partitioning support: Row and column partitioning helpers for large table validation.
- Detailed reporting: CSV reports, console output, and comprehensive logging.
- Extensible architecture: Ready for additional database engines.
📦 Installation
pip install snowflake-data-validation
For SQL Server support:
pip install "snowflake-data-validation[sqlserver]"
For development and testing:
pip install "snowflake-data-validation[all]"
🔄 Execution Modes
| Mode | Command | Description |
|---|---|---|
| Sync Validation | run-validation |
Real-time comparison between source and target databases |
| Source Extraction | source-validate |
Extract source data to Parquet files for offline validation |
| Async Validation | run-async-validation |
Validate using pre-extracted Parquet files |
| Script Generation | generate-validation-scripts |
Generate SQL scripts for manual execution |
Supported Dialects: sqlserver, snowflake, redshift, teradata
🔍 Validation Levels
Schema Validation
Compares table structure between source and target:
- Column names and order
- Data types with mapping support
- Precision, scale, and length
- Nullable constraints
Metrics Validation
Compares statistical metrics for each column:
- Row count
- Min/Max values
- Sum and Average
- Null count
- Distinct count
Row Validation
Performs row-by-row comparison:
- Primary key matching
- Field-level value comparison
- Mismatch reporting
📊 Reports
- Console Output: Real-time progress with success/failure indicators
- CSV Reports: Detailed validation results with all comparison data
- Log Files: Comprehensive debug and error logging
📚 Documentation
For complete command reference, configuration options, and examples, see the Data Validation CLI.
🤝 Contributing
We welcome contributions! See our Contributing Guide for details on how to collaborate, set up your development environment, and submit PRs.
📄 License
This project is licensed under the Snowflake Conversion Software Terms. See the LICENSE file for the full text or visit the Conversion Software Terms for more information.
🆘 Support
- Documentation: Full documentation
- Issues: GitHub Issues
Developed with ❄️ by Snowflake
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snowflake_data_validation-1.5.1.tar.gz.
File metadata
- Download URL: snowflake_data_validation-1.5.1.tar.gz
- Upload date:
- Size: 385.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d30a650ea602db042b8b16dd0498a7f132d70a1efbe04bdafb5233138fc396e4
|
|
| MD5 |
113d36fab7cb01ddd02275efb430edcd
|
|
| BLAKE2b-256 |
464644ea757c4dd55a668e4366c0008464a9d0c638410827aa755b0fbd4dbfdd
|
File details
Details for the file snowflake_data_validation-1.5.1-py3-none-any.whl.
File metadata
- Download URL: snowflake_data_validation-1.5.1-py3-none-any.whl
- Upload date:
- Size: 459.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b25a01f9b19e21980c24e7ab345bc7b168da0cd19b8d22e3575ae348e2b6690
|
|
| MD5 |
9a09d4bb5ca5d34342da7abeb3bcd816
|
|
| BLAKE2b-256 |
ea67a2aa64d395fa14b9291b479985d6089711ec323cab7173f601a604bd38b3
|