Skip to main content

A Python library that reads data contracts and generates Pydantic models for seamless data validation.

Project description

Data-Sitter

Coverage

Overview

Data-Sitter is a Python library designed to simplify data validation by converting data contracts into Pydantic models. This allows for easy and efficient validation of structured data, ensuring compliance with predefined rules and constraints.

Features

  • Define structured data contracts in JSON format.
  • Generate Pydantic models automatically from contracts.
  • Enforce validation rules at the field level.
  • Support for rule references within the contract.

Installation

pip install data-sitter

Development and Deployment

CI/CD Pipeline

The project uses GitHub Actions for continuous integration and deployment:

  1. Pull Request Checks

    • Automatically checks if the version has been bumped in pyproject.toml
    • Fails if the version is the same as in the main branch
    • Ensures every PR includes a version update
  2. Automatic Releases

    • When code is merged to the main branch:
      • Builds the package
      • Publishes to PyPI automatically
    • Uses PyPI API token for secure authentication

To set up the CI/CD pipeline:

  1. Create a PyPI API token:

  2. Add the token to GitHub:

    • Go to your repository's Settings > Secrets and variables > Actions
    • Create a new secret named PYPI_API_TOKEN
    • Paste your PyPI API token

Setting Up Development Environment

To set up a development environment with all the necessary tools, install the package with development dependencies:

pip install -e ".[dev]"

This will install:

  • The package in editable mode
  • Testing tools (pytest, pytest-cov, pytest-mock)
  • Build tools (build, twine)

Building the Package

To build the package, run:

python -m build

This will create a dist directory containing both a source distribution (.tar.gz) and a wheel (.whl).

Deploying to PyPI

To upload to PyPI:

twine upload dist/*

You'll be prompted for your PyPI username and password. For security, it's recommended to use an API token instead of your password.

Usage

Creating a Pydantic Model from a Contract

To convert a data contract into a Pydantic model, follow these steps:

from data_sitter import Contract

contract_dict = {
    "name": "test",
    "fields": [
        {
            "name": "FID",
            "type": "Integer",
            "rules": ["Positive"]
        },
        {
            "name": "SECCLASS",
            "type": "String",
            "rules": [
                "Validate Not Null",
                "Value In ['UNCLASSIFIED', 'CLASSIFIED']",
            ]
        }
    ],
}

contract = Contract.from_dict(contract_dict)
pydantic_contract = contract.pydantic_model

Using Rule References

Data-Sitter allows you to define reusable values in the values key and reference them in field rules using $values.[key]. For example:

{
    "name": "example_contract",
    "fields": [
        {
            "name": "CATEGORY",
            "type": "String",
            "rules": ["Value In $values.categories"]
        },
        {
            "name": "NAME",
            "type": "String",
            "rules": [
                "Length Between $values.min_length and $values.max_length"
            ]
        }

    ],
    "values": {"categories": ["A", "B", "C"], "min_length": 5,"max_length": 50}
}

Available Rules

The available validation rules can be retrieved programmatically:

from data_sitter import RuleRegistry

rules = RuleRegistry.get_rules_definition()
print(rules)

Rule Definitions

Below are the available rules grouped by field type:

Base

  • Is not null

String - (Inherits from Base)

  • Is not empty
  • Starts with {prefix:String}
  • Ends with {suffix:String}
  • Is not one of {possible_values:Strings}
  • Is one of {possible_values:Strings}
  • Has length between {min_val:Integer} and {max_val:Integer}
  • Has maximum length {max_len:Integer}
  • Has minimum length {min_len:Integer}
  • Is uppercase
  • Is lowercase
  • Matches regex {pattern:String}
  • Is valid email
  • Is valid URL
  • Has no digits

Numeric - (Inherits from Base)

  • Is not zero
  • Is positive
  • Is negative
  • Is at least {min_val:Number}
  • Is at most {max_val:Number}
  • Is greater than {threshold:Number}
  • Is less than {threshold:Number}
  • Is not between {min_val:Number} and {max_val:Number}
  • Is between {min_val:Number} and {max_val:Number}

Integer - (Inherits from Numeric)

Float - (Inherits from Numeric)

  • Has at most {decimal_places:Integer} decimal places

Contributing

Contributions are welcome! Feel free to submit issues or pull requests in the GitHub repository.

License

Data-Sitter is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_sitter-0.1.7.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_sitter-0.1.7-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file data_sitter-0.1.7.tar.gz.

File metadata

  • Download URL: data_sitter-0.1.7.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for data_sitter-0.1.7.tar.gz
Algorithm Hash digest
SHA256 c3a5340f0d6b881b21b9a6d411cdb6e60c6b1bab6b2e89a183fd8d79901d22ba
MD5 3a989146f79355af507d336d3444cb3b
BLAKE2b-256 d6ab67ef2a3e5fe5d1b3abec9730155105be6e7ff50a24ed36c241d4f6a7c438

See more details on using hashes here.

File details

Details for the file data_sitter-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: data_sitter-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for data_sitter-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 31f7ab72079e6ba992308745656e421a6244befe25a493808b198f0461c25620
MD5 1406a6ee4d64182bac77309f7f75912a
BLAKE2b-256 09460245d0f25722bbcafbb7ba1307dabd9c37a53d70cf7eb1c7dd8058d9a0b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page