A flexible, extensible command-line tool for automated data quality validation
Project description
ValidateLite
ValidateLite: A lightweight data validation tool for engineers who need answers, fast.
Unlike other complex data validation tools, ValidateLite provides two powerful, focused commands for different scenarios:
-
vlite check: For quick, ad-hoc data checks. Need to verify if a column is unique or not null right now? Thecheckcommand gets you an answer in 30 seconds, zero config required. -
vlite schema: For robust, repeatable database schema validation. It's your best defense against schema drift. Embed it in your CI/CD and ETL pipelines to enforce data contracts, ensuring data integrity before it becomes a problem.
Core Use Case: Automated Schema Validation
The vlite schema command is key to ensuring the stability of your data pipelines. It allows you to quickly verify that a database table or data file conforms to a defined structure.
Scenario 1: Gate Deployments in CI/CD
Automatically check for breaking schema changes before they get deployed, preventing production issues caused by unexpected modifications.
Example Workflow (.github/workflows/ci.yml)
jobs:
validate-db-schema:
name: Validate Database Schema
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install ValidateLite
run: pip install validatelite
- name: Run Schema Validation
run: |
vlite schema --conn "mysql://${{ secrets.DB_USER }}:${{ secrets.DB_PASS }}@${{ secrets.DB_HOST }}/sales" \
--rules ./schemas/customers_schema.json
Scenario 2: Monitor ETL/ELT Pipelines
Set up validation checkpoints at various stages of your data pipelines to guarantee data quality and avoid "garbage in, garbage out."
Example Rule File (customers_schema.json)
{
"customers": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{ "field": "name", "type": "string", "required": true },
{ "field": "email", "type": "string", "required": true },
{ "field": "age", "type": "integer", "min": 18, "max": 100 },
{ "field": "gender", "enum": ["Male", "Female", "Other"] },
{ "field": "invalid_col" }
]
}
}
Run Command:
vlite schema --conn "mysql://user:pass@host:3306/sales" --rules customers_schema.json
Quick Start: Ad-Hoc Checks with check
For temporary, one-off validation needs, the check command is your best friend.
1. Install (if you haven't already):
pip install validatelite
2. Run a check:
# Check for nulls in a CSV file's 'id' column
vlite check --conn "customers.csv" --table customers --rule "not_null(id)"
# Check for uniqueness in a database table's 'email' column
vlite check --conn "mysql://user:pass@host/db" --table customers --rule "unique(email)"
Learn More
- Usage Guide (USAGE.md): Learn about all commands, arguments, and advanced features.
- Configuration Reference (CONFIG_REFERENCE.md): See how to configure the tool via
tomlfiles. - Contributing Guide (CONTRIBUTING.md): We welcome contributions!
📝 Development Blog
Follow the journey of building ValidateLite through our development blog posts:
- DevLog #1: Building a Zero-Config Data Validation Tool
- **DevLog #2: Why I Scrapped My Half-Built Data Validation Platform
- **Rule-Driven Schema Validation: A Lightweight Solution
📄 License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file validatelite-0.4.2.tar.gz.
File metadata
- Download URL: validatelite-0.4.2.tar.gz
- Upload date:
- Size: 285.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b92b2bd9b0fa2260e2f3cb2d4529ca5b1e61718b27a2b9e227a64306c1cd0215
|
|
| MD5 |
805c6a82883d447b5565fd06af30fa51
|
|
| BLAKE2b-256 |
d4d1651b2402a1c32e8b433e30f992e7c78afa46973a1a574449b21e059c1f1a
|
File details
Details for the file validatelite-0.4.2-py3-none-any.whl.
File metadata
- Download URL: validatelite-0.4.2-py3-none-any.whl
- Upload date:
- Size: 160.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0abc653fe06d31ef12847283800c1466eae4a5a18283c3f535c09fa0281a978b
|
|
| MD5 |
deb8b9c1f8fbca3afc32941f01b9042c
|
|
| BLAKE2b-256 |
5cb7b72312d0dff165ddd1514f9e2f6987a990d5f1dc8b34d4773f50936eced0
|