Skip to main content

A flexible, extensible command-line tool for automated data quality validation

Project description

ValidateLite

A flexible, extensible command-line tool for automated data quality validation, profiling, and rule-based checks across diverse data sources. Designed for data engineers, analysts, and developers to ensure data reliability and compliance in modern data pipelines.

Python 3.8+ License: MIT Code Coverage


📝 Development Blog

Follow the journey of building ValidateLite through our development blog posts:


🚀 Quick Start

For Regular Users

Option 1: Install from PyPI (Recommended)

pip install validatelite
vlite --help

Option 2: Install from pre-built package

# Download the latest release from GitHub
pip install validatelite-0.1.0-py3-none-any.whl
vlite --help

Option 3: Run from source

git clone https://github.com/litedatum/validatelite.git
cd validatelite
pip install -r requirements.txt
python cli_main.py --help

Option 4: Install with pip-tools (for development)

git clone https://github.com/litedatum/validatelite.git
cd validatelite
pip install pip-tools
pip-compile requirements.in
pip install -r requirements.txt
python cli_main.py --help

For Developers & Contributors

If you want to contribute to the project or need the latest development version:

git clone https://github.com/litedatum/validatelite.git
cd validatelite

# Install dependencies (choose one approach)
# Option 1: Install from pinned requirements
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Option 2: Use pip-tools for development
pip install pip-tools
python scripts/update_requirements.py
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

See DEVELOPMENT_SETUP.md for detailed development setup instructions.


✨ Features

  • 🔧 Rule-based Data Quality Engine: Supports completeness, uniqueness, validity, and custom rules
  • 🖥️ Extensible CLI: Easily integrate with CI/CD and automation workflows
  • 🗄️ Multi-Source Support: Validate data from files (CSV, Excel) and databases (MySQL, PostgreSQL, SQLite)
  • ⚙️ Configurable & Modular: Flexible configuration via TOML and environment variables
  • 🛡️ Comprehensive Error Handling: Robust exception and error classification system
  • 🧪 Tested & Reliable: High code coverage, modular tests, and pre-commit hooks

📖 Documentation


🎯 Basic Usage

Validate a CSV file

vlite check data.csv --rule "not_null(id)" --rule "unique(email)"

Validate a database table

vlite check "mysql://user:pass@host:3306/db.table" --rules validation_rules.json

Check with verbose output

vlite check data.csv --rules rules.json --verbose

For detailed usage examples and advanced features, see USAGE.md.


🏗️ Project Structure

validatelite/
├── cli/           # CLI logic and commands
├── core/          # Rule engine and core validation logic
├── shared/        # Common utilities, enums, exceptions, and schemas
├── config/        # Example and template configuration files
├── tests/         # Unit, integration, and E2E tests
├── scripts/       # Utility scripts
├── docs/          # Documentation
└── examples/      # Usage examples and sample data

🧪 Testing

For Regular Users

The project includes comprehensive tests to ensure reliability. If you encounter issues, please check the troubleshooting section in the usage guide.

For Developers

# Set up test databases (requires Docker)
./scripts/setup_test_databases.sh start

# Run all tests with coverage
pytest -vv --cov

# Run specific test categories
pytest tests/unit/ -v          # Unit tests only
pytest tests/integration/ -v   # Integration tests
pytest tests/e2e/ -v           # End-to-end tests

# Code quality checks
pre-commit run --all-files

# Stop test databases when done
./scripts/setup_test_databases.sh stop

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines and Code of Conduct.

Development Setup

For detailed development setup instructions, see DEVELOPMENT_SETUP.md.


🔒 Security

For security issues, please review SECURITY.md and follow the recommended process.


📄 License

This project is licensed under the terms of the MIT License.


🙏 Acknowledgements

  • Inspired by best practices in data engineering and open-source data quality tools
  • Thanks to all contributors and users for their feedback and support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

validatelite-0.3.0.tar.gz (261.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

validatelite-0.3.0-py3-none-any.whl (145.9 kB view details)

Uploaded Python 3

File details

Details for the file validatelite-0.3.0.tar.gz.

File metadata

  • Download URL: validatelite-0.3.0.tar.gz
  • Upload date:
  • Size: 261.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for validatelite-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c51685788869042a14cd5922c2581f2f89c7dabe2d3771b34a5d04c7daad4a05
MD5 d74a0ba0877de997da6847373de13fe6
BLAKE2b-256 7d050c5f7ccaaef4f8442c368d03ba08002318990137622b92197c172371b681

See more details on using hashes here.

File details

Details for the file validatelite-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: validatelite-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 145.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for validatelite-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 995490353319dd9d89e81a20ca2b0715c6ded03415f95eeeb04eba2e08c1ebfe
MD5 841241d5c81ae172526d45c5a28024e3
BLAKE2b-256 051b0a094f05bb904c178ed78fde7bb20a294470f2ac051ddcea4b3aeedf6537

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page