Skip to main content

A flexible, extensible command-line tool for automated data quality validation

Project description

ValidateLite

PyPI version Python 3.8+ License: MIT Code Coverage

ValidateLite: A lightweight, scenario-driven data validation tool for modern data practitioners.

Whether you're a data scientist cleaning a messy CSV, a data engineer building robust pipelines, or a developer needing a quick check, ValidateLite provides powerful, focused commands for your use case:

  • vlite check: For quick, ad-hoc data checks. Need to verify if a column is unique or not null right now? The check command gets you an answer in seconds, zero config required.

  • vlite schema: For robust, repeatable, and automated validation. Define your data's contract in a JSON schema and let ValidateLite verify everything from data types and ranges to complex type-conversion feasibility.


Who is it for?

For the Data Scientist: Preparing Data for Analysis

You have a messy dataset (legacy_data.csv) where everything is a string. Before you can build a model, you need to clean it up and convert columns to their proper types (integer, float, date). How much work will it be?

Instead of writing complex cleaning scripts first, use vlite schema to assess the feasibility of the cleanup.

1. Define Your Target Schema (rules.json)

Create a schema file that describes the current type and the desired type.

{
  "legacy_users": {
    "rules": [
      {
        "field": "user_id",
        "type": "string",
        "desired_type": "integer",
        "required": true
      },
      {
        "field": "salary",
        "type": "string",
        "desired_type": "float(10,2)",
        "required": true
      },
      {
        "field": "bio",
        "type": "string",
        "desired_type": "string(500)",
        "required": false
      }
    ]
  }
}

2. Run the Validation

vlite schema --conn legacy_data.csv --rules rules.json

ValidateLite will generate a report telling you exactly what can and cannot be converted, saving you hours of guesswork.

FIELD VALIDATION RESULTS
========================

Field: user_id
  ✓ Field exists (string)
  ✓ Not Null constraint
  ✗ Type Conversion Validation (string → integer): 15 incompatible records found

Field: salary
  ✓ Field exists (string)
  ✗ Type Conversion Validation (string → float(10,2)): 8 incompatible records found

Field: bio
  ✓ Field exists (string)
  ✓ Length Constraint Validation (string → string(500)): PASSED

For the Data Engineer: Ensuring Data Integrity in CI/CD

You need to prevent breaking schema changes and bad data from ever reaching production. Embed ValidateLite into your CI/CD pipeline to act as a quality gate.

Example Workflow (.github/workflows/ci.yml)

This workflow automatically validates the database schema on every pull request.

jobs:
  validate-db-schema:
    name: Validate Database Schema
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install ValidateLite
        run: pip install validatelite

      - name: Run Schema Validation
        run: |
          vlite schema --conn "mysql://${{ secrets.DB_USER }}:${{ secrets.DB_PASS }}@${{ secrets.DB_HOST }}/sales" \
                       --rules ./schemas/customers_schema.json \
                       --fail-on-error

This same approach can be used to monitor data quality at every stage of your ETL/ELT pipelines, preventing "garbage in, garbage out."


Quick Start: Ad-Hoc Checks with check

For temporary, one-off validation needs, the check command is your best friend. You can run multiple rules on any supported data source (files or databases) directly from the command line.

1. Install (if you haven't already):

pip install validatelite

2. Run a check:

# Check for nulls and uniqueness in a CSV file
vlite check --conn "customers.csv" --table customers \
  --rule "not_null(id)" \
  --rule "unique(email)"

# Check value ranges and formats in a database table
vlite check --conn "mysql://user:pass@host/db" --table customers \
  --rule "range(age, 18, 99)" \
  --rule "enum(status, 'active', 'inactive')"

Learn More


📝 Development Blog

Follow the journey of building ValidateLite through our development blog posts:


📄 License

This project is licensed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

validatelite-0.5.0.tar.gz (344.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

validatelite-0.5.0-py3-none-any.whl (192.1 kB view details)

Uploaded Python 3

File details

Details for the file validatelite-0.5.0.tar.gz.

File metadata

  • Download URL: validatelite-0.5.0.tar.gz
  • Upload date:
  • Size: 344.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for validatelite-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8c0425b850f06163f7bb4eab471c70245db57d7e138c14fc2daaa62c07baf72d
MD5 388d7b616c9e6a3b756af12a4e36f062
BLAKE2b-256 be43b00de4b25fd9735844332ef361cbf45c68fdc9ef28d0638464938f1c5417

See more details on using hashes here.

File details

Details for the file validatelite-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: validatelite-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 192.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for validatelite-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d081c6c7f6e31fb6e6042752893f2551f2a6557b8ff2e3c2b7416ee2e0c8e353
MD5 3ce94e2468c403060f39cf7b690b7d32
BLAKE2b-256 19549f7013c1f1ba28ea8165156c900eef25422fdb5d2566ff2619a3bc44da17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page