A flexible, extensible command-line tool for automated data quality validation
Project description
ValidateLite
A flexible, extensible command-line tool for automated data quality validation, profiling, and rule-based checks across diverse data sources. Designed for data engineers, analysts, and developers to ensure data reliability and compliance in modern data pipelines.
📝 Development Blog
Follow the journey of building ValidateLite through our development blog posts:
- DevLog #1: Building a Zero-Config Data Validation Tool - The initial vision and architecture of ValidateLite
- DevLog #2: Why I Scrapped My Half-Built Data Validation Platform - Lessons learned from scope creep and the pivot to a focused CLI tool
🚀 Quick Start
For Regular Users
Option 1: Install from PyPI (Recommended)
pip install validatelite
vlite --help
Option 2: Install from pre-built package
# Download the latest release from GitHub
pip install validatelite-0.1.0-py3-none-any.whl
vlite --help
Option 3: Run from source
git clone https://github.com/litedatum/validatelite.git
cd validatelite
pip install -r requirements.txt
python cli_main.py --help
Option 4: Install with pip-tools (for development)
git clone https://github.com/litedatum/validatelite.git
cd validatelite
pip install pip-tools
pip-compile requirements.in
pip install -r requirements.txt
python cli_main.py --help
For Developers & Contributors
If you want to contribute to the project or need the latest development version:
git clone https://github.com/litedatum/validatelite.git
cd validatelite
# Install dependencies (choose one approach)
# Option 1: Install from pinned requirements
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Option 2: Use pip-tools for development
pip install pip-tools
python scripts/update_requirements.py
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
See DEVELOPMENT_SETUP.md for detailed development setup instructions.
✨ Features
- 🔧 Rule-based Data Quality Engine: Supports completeness, uniqueness, validity, and custom rules
- 🖥️ Extensible CLI: Easily integrate with CI/CD and automation workflows
- 🗄️ Multi-Source Support: Validate data from files (CSV, Excel) and databases (MySQL, PostgreSQL, SQLite)
- ⚙️ Configurable & Modular: Flexible configuration via TOML and environment variables
- 🛡️ Comprehensive Error Handling: Robust exception and error classification system
- 🧪 Tested & Reliable: High code coverage, modular tests, and pre-commit hooks
📖 Documentation
- USAGE.md - Complete user guide with examples and best practices
- DEVELOPMENT_SETUP.md - Development environment setup and contribution guidelines
- CONFIG_REFERENCE.md - Configuration file reference
- ROADMAP.md - Development roadmap and future plans
- CHANGELOG.md - Release history and changes
🎯 Basic Usage
Validate a CSV file
vlite check data.csv --rule "not_null(id)" --rule "unique(email)"
Validate a database table
vlite check "mysql://user:pass@host:3306/db.table" --rules validation_rules.json
Check with verbose output
vlite check data.csv --rules rules.json --verbose
For detailed usage examples and advanced features, see USAGE.md.
🏗️ Project Structure
validatelite/
├── cli/ # CLI logic and commands
├── core/ # Rule engine and core validation logic
├── shared/ # Common utilities, enums, exceptions, and schemas
├── config/ # Example and template configuration files
├── tests/ # Unit, integration, and E2E tests
├── scripts/ # Utility scripts
├── docs/ # Documentation
└── examples/ # Usage examples and sample data
🧪 Testing
For Regular Users
The project includes comprehensive tests to ensure reliability. If you encounter issues, please check the troubleshooting section in the usage guide.
For Developers
# Set up test databases (requires Docker)
./scripts/setup_test_databases.sh start
# Run all tests with coverage
pytest -vv --cov
# Run specific test categories
pytest tests/unit/ -v # Unit tests only
pytest tests/integration/ -v # Integration tests
pytest tests/e2e/ -v # End-to-end tests
# Code quality checks
pre-commit run --all-files
# Stop test databases when done
./scripts/setup_test_databases.sh stop
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines and Code of Conduct.
Development Setup
For detailed development setup instructions, see DEVELOPMENT_SETUP.md.
🔒 Security
For security issues, please review SECURITY.md and follow the recommended process.
📄 License
This project is licensed under the terms of the MIT License.
🙏 Acknowledgements
- Inspired by best practices in data engineering and open-source data quality tools
- Thanks to all contributors and users for their feedback and support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file validatelite-0.3.0.tar.gz.
File metadata
- Download URL: validatelite-0.3.0.tar.gz
- Upload date:
- Size: 261.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c51685788869042a14cd5922c2581f2f89c7dabe2d3771b34a5d04c7daad4a05
|
|
| MD5 |
d74a0ba0877de997da6847373de13fe6
|
|
| BLAKE2b-256 |
7d050c5f7ccaaef4f8442c368d03ba08002318990137622b92197c172371b681
|
File details
Details for the file validatelite-0.3.0-py3-none-any.whl.
File metadata
- Download URL: validatelite-0.3.0-py3-none-any.whl
- Upload date:
- Size: 145.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
995490353319dd9d89e81a20ca2b0715c6ded03415f95eeeb04eba2e08c1ebfe
|
|
| MD5 |
841241d5c81ae172526d45c5a28024e3
|
|
| BLAKE2b-256 |
051b0a094f05bb904c178ed78fde7bb20a294470f2ac051ddcea4b3aeedf6537
|