A Python library that reads data contracts and generates Pydantic models for seamless data validation.
Project description
Data-Sitter
Overview
Data-Sitter is a Python library designed to simplify data validation by converting data contracts into Pydantic models. This allows for easy and efficient validation of structured data, ensuring compliance with predefined rules and constraints.
Features
- Define structured data contracts in JSON format.
- Generate Pydantic models automatically from contracts.
- Enforce validation rules at the field level.
- Support for rule references within the contract.
Installation
pip install data-sitter
Development and Deployment
CI/CD Pipeline
The project uses GitHub Actions for continuous integration and deployment:
-
Pull Request Checks
- Automatically checks if the version has been bumped in
pyproject.toml - Fails if the version is the same as in the main branch
- Ensures every PR includes a version update
- Automatically checks if the version has been bumped in
-
Automatic Releases
- When code is merged to the main branch:
- Builds the package
- Publishes to PyPI automatically
- Uses PyPI API token for secure authentication
- When code is merged to the main branch:
To set up the CI/CD pipeline:
-
Create a PyPI API token:
- Go to PyPI Account Settings
- Create a new API token with "Upload" scope
- Copy the token
-
Add the token to GitHub:
- Go to your repository's Settings > Secrets and variables > Actions
- Create a new secret named
PYPI_API_TOKEN - Paste your PyPI API token
Setting Up Development Environment
To set up a development environment with all the necessary tools, install the package with development dependencies:
pip install -e ".[dev]"
This will install:
- The package in editable mode
- Testing tools (pytest, pytest-cov, pytest-mock)
- Build tools (build, twine)
Building the Package
To build the package, run:
python -m build
This will create a dist directory containing both a source distribution (.tar.gz) and a wheel (.whl).
Deploying to PyPI
To upload to PyPI:
twine upload dist/*
You'll be prompted for your PyPI username and password. For security, it's recommended to use an API token instead of your password.
Usage
Creating a Pydantic Model from a Contract
To convert a data contract into a Pydantic model, follow these steps:
from data_sitter import Contract
contract_dict = {
"name": "test",
"fields": [
{
"name": "FID",
"type": "Integer",
"rules": ["Positive"]
},
{
"name": "SECCLASS",
"type": "String",
"rules": [
"Validate Not Null",
"Value In ['UNCLASSIFIED', 'CLASSIFIED']",
]
}
],
}
contract = Contract.from_dict(contract_dict)
pydantic_contract = contract.pydantic_model
Using Rule References
Data-Sitter allows you to define reusable values in the values key and reference them in field rules using $values.[key]. For example:
{
"name": "example_contract",
"fields": [
{
"name": "CATEGORY",
"type": "String",
"rules": ["Value In $values.categories"]
},
{
"name": "NAME",
"type": "String",
"rules": [
"Length Between $values.min_length and $values.max_length"
]
}
],
"values": {"categories": ["A", "B", "C"], "min_length": 5,"max_length": 50}
}
Available Rules
The available validation rules can be retrieved programmatically:
from data_sitter import RuleRegistry
rules = RuleRegistry.get_rules_definition()
print(rules)
Rule Definitions
Below are the available rules grouped by field type:
Base
- Is not null
String - (Inherits from Base)
- Is not empty
- Starts with {prefix:String}
- Ends with {suffix:String}
- Is not one of {possible_values:Strings}
- Is one of {possible_values:Strings}
- Has length between {min_val:Integer} and {max_val:Integer}
- Has maximum length {max_len:Integer}
- Has minimum length {min_len:Integer}
- Is uppercase
- Is lowercase
- Matches regex {pattern:String}
- Is valid email
- Is valid URL
- Has no digits
Numeric - (Inherits from Base)
- Is not zero
- Is positive
- Is negative
- Is at least {min_val:Number}
- Is at most {max_val:Number}
- Is greater than {threshold:Number}
- Is less than {threshold:Number}
- Is not between {min_val:Number} and {max_val:Number}
- Is between {min_val:Number} and {max_val:Number}
Integer - (Inherits from Numeric)
Float - (Inherits from Numeric)
- Has at most {decimal_places:Integer} decimal places
Contributing
Contributions are welcome! Feel free to submit issues or pull requests in the GitHub repository.
License
Data-Sitter is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_sitter-0.1.7.tar.gz.
File metadata
- Download URL: data_sitter-0.1.7.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3a5340f0d6b881b21b9a6d411cdb6e60c6b1bab6b2e89a183fd8d79901d22ba
|
|
| MD5 |
3a989146f79355af507d336d3444cb3b
|
|
| BLAKE2b-256 |
d6ab67ef2a3e5fe5d1b3abec9730155105be6e7ff50a24ed36c241d4f6a7c438
|
File details
Details for the file data_sitter-0.1.7-py3-none-any.whl.
File metadata
- Download URL: data_sitter-0.1.7-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31f7ab72079e6ba992308745656e421a6244befe25a493808b198f0461c25620
|
|
| MD5 |
1406a6ee4d64182bac77309f7f75912a
|
|
| BLAKE2b-256 |
09460245d0f25722bbcafbb7ba1307dabd9c37a53d70cf7eb1c7dd8058d9a0b6
|