Skip to main content

A Python library that reads data contracts and generates Pydantic models for seamless data validation.

Project description

Data-Sitter

Coverage

Overview

Data-Sitter is a Python library designed to simplify data validation by converting data contracts into Pydantic models. This allows for easy and efficient validation of structured data, ensuring compliance with predefined rules and constraints.

Features

  • Define structured data contracts in JSON format.
  • Generate Pydantic models automatically from contracts.
  • Enforce validation rules at the field level.
  • Support for rule references within the contract.

Installation

pip install data-sitter

Development and Deployment

CI/CD Pipeline

The project uses GitHub Actions for continuous integration and deployment:

  1. Pull Request Checks

    • Automatically checks if the version has been bumped in pyproject.toml
    • Fails if the version is the same as in the main branch
    • Ensures every PR includes a version update
  2. Automatic Releases

    • When code is merged to the main branch:
      • Builds the package
      • Publishes to PyPI automatically
    • Uses PyPI API token for secure authentication

To set up the CI/CD pipeline:

  1. Create a PyPI API token:

  2. Add the token to GitHub:

    • Go to your repository's Settings > Secrets and variables > Actions
    • Create a new secret named PYPI_API_TOKEN
    • Paste your PyPI API token

Setting Up Development Environment

To set up a development environment with all the necessary tools, install the package with development dependencies:

pip install -e ".[dev]"

This will install:

  • The package in editable mode
  • Testing tools (pytest, pytest-cov, pytest-mock)
  • Build tools (build, twine)

Building the Package

To build the package, run:

python -m build

This will create a dist directory containing both a source distribution (.tar.gz) and a wheel (.whl).

Deploying to PyPI

To upload to PyPI:

twine upload dist/*

You'll be prompted for your PyPI username and password. For security, it's recommended to use an API token instead of your password.

Usage

Creating a Pydantic Model from a Contract

To convert a data contract into a Pydantic model, follow these steps:

from data_sitter import Contract

contract_dict = {
    "name": "test",
    "fields": [
        {
            "name": "FID",
            "type": "Integer",
            "rules": ["Positive"]
        },
        {
            "name": "SECCLASS",
            "type": "String",
            "rules": [
                "Validate Not Null",
                "Value In ['UNCLASSIFIED', 'CLASSIFIED']",
            ]
        }
    ],
}

contract = Contract.from_dict(contract_dict)
pydantic_contract = contract.pydantic_model

Using Rule References

Data-Sitter allows you to define reusable values in the values key and reference them in field rules using $values.[key]. For example:

{
    "name": "example_contract",
    "fields": [
        {
            "name": "CATEGORY",
            "type": "String",
            "rules": ["Value In $values.categories"]
        },
        {
            "name": "NAME",
            "type": "String",
            "rules": [
                "Length Between $values.min_length and $values.max_length"
            ]
        }

    ],
    "values": {"categories": ["A", "B", "C"], "min_length": 5,"max_length": 50}
}

Available Rules

The available validation rules can be retrieved programmatically:

from data_sitter import RuleRegistry

rules = RuleRegistry.get_rules_definition()
print(rules)

Rule Definitions

Below are the available rules grouped by field type:

Base

  • Is not null

String - (Inherits from Base)

  • Is not empty
  • Starts with {prefix:String}
  • Ends with {suffix:String}
  • Is not one of {possible_values:Strings}
  • Is one of {possible_values:Strings}
  • Has length between {min_val:Integer} and {max_val:Integer}
  • Has maximum length {max_len:Integer}
  • Has minimum length {min_len:Integer}
  • Is uppercase
  • Is lowercase
  • Matches regex {pattern:String}
  • Is valid email
  • Is valid URL
  • Has no digits

Numeric - (Inherits from Base)

  • Is not zero
  • Is positive
  • Is negative
  • Is at least {min_val:Number}
  • Is at most {max_val:Number}
  • Is greater than {threshold:Number}
  • Is less than {threshold:Number}
  • Is not between {min_val:Number} and {max_val:Number}
  • Is between {min_val:Number} and {max_val:Number}

Integer - (Inherits from Numeric)

Float - (Inherits from Numeric)

  • Has at most {decimal_places:Integer} decimal places

Contributing

Contributions are welcome! Feel free to submit issues or pull requests in the GitHub repository.

License

Data-Sitter is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_sitter-0.1.6.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_sitter-0.1.6-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file data_sitter-0.1.6.tar.gz.

File metadata

  • Download URL: data_sitter-0.1.6.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for data_sitter-0.1.6.tar.gz
Algorithm Hash digest
SHA256 8a44684e4b139a22f1ff9294a4e4055ef22b3511fc939519ae226c6ebe8f9e1d
MD5 1c695a10d94a8ae2c599941e01eb0a63
BLAKE2b-256 98f83b5ab8c78249c23fe41a1e69a650e3008861f16b2f7e8afcd43709169216

See more details on using hashes here.

File details

Details for the file data_sitter-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: data_sitter-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for data_sitter-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 57d580cb4a3be103fd41f71e523a845be142ced69ffbdeb1010b0173aae04655
MD5 a8f5e839a0b3fcec619c84c5c0907365
BLAKE2b-256 9eda8b29466adef5f92a10d7d8207cc4e4ad4a9e32d2fce4d5d86b21c69c5679

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page