Git-backed data schema versioning and enforcement for Python
Project description
datalasi
Versioned data schema enforcement for Python.
Define data contracts as YAML files, version them in Git, and validate DataFrames against them. Think: pytest for your data schemas, with semantic versioning.
from datalasi import DataContract, Field, Int64, Float64, Enum
from datalasi.io import YAMLWriter, YAMLLoader
contract = DataContract(
name="transactions",
version="1.0.0",
schema={
"transaction_id": Field("transaction_id", Int64(), pk=True, nullable=False),
"amount": Field("amount", Float64(min=0.01), nullable=False),
"status": Field("status", Enum(["PENDING", "COMPLETED", "FAILED"]), nullable=False),
},
expectations=["amount > 0"],
owner="data-eng@example.com",
)
# Save to YAML (Git-versioned)
YAMLWriter.write(contract, "contracts/transactions-v1.0.0.yaml")
# Load from YAML
loaded = YAMLLoader.load("contracts/transactions-v1.0.0.yaml")
assert loaded == contract
Installation
pip install datalasi
With optional adapters:
pip install "datalasi[pandas]" # Pandas DataFrame validation
pip install "datalasi[polars]" # Polars DataFrame validation
pip install "datalasi[all]" # All adapters
Core Concepts
DataContract
A contract describes the expected structure of a dataset:
- name — unique identifier (e.g.
transactions) - version — semantic version (
MAJOR.MINOR.PATCH) - schema — column definitions with types, nullability, constraints
- expectations — data-quality rules (stored as strings, evaluated by adapters)
- breaking_changes —
FAIL,WARN, orIGNORE
Field
Each field carries a DataType plus metadata:
Field("amount", Float64(min=0.01, max=1_000_000), nullable=False, description="USD amount")
Supported Types
| Type | Description | Constraints |
|---|---|---|
Int64 |
64-bit integer | min, max |
Int32 |
32-bit integer | min, max |
Float64 |
64-bit float | min, max |
String |
Text | max_length, pattern |
Boolean |
True/False | — |
Date |
YYYY-MM-DD string | — |
Timestamp |
ISO datetime string | timezone |
Enum |
Fixed value set | allowed_values |
YAML Contract Format
name: transactions
version: 1.0.0
owner: data-eng@company.com
breaking_changes: FAIL
schema:
transaction_id:
type: Int64
nullable: false
pk: true
amount:
type: Float64
nullable: false
min: 0.01
max: 1000000
status:
type: Enum
allowed_values: [PENDING, COMPLETED, FAILED]
nullable: false
expectations:
- "amount > 0"
Schema Evolution
v2 = v1.evolve(
version="1.1.0",
schema_additions={"currency": Field("currency", String(), description="ISO 4217 code")},
)
Development
git clone https://github.com/malodeity/datalasi
cd datalasi
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# Run tests
pytest tests/unit/ -v --cov=datalasi
# Lint & format
ruff check datalasi tests
black datalasi tests
mypy datalasi
Building & Publishing to PyPI
pip install build twine
# Build distribution
python -m build
# Upload to PyPI
twine upload dist/*
Roadmap
- 0.1.0 — Core type system, YAML I/O, contract model ✓
- 0.2.0 — Pandas & Polars adapters, DataFrame validation
- 0.3.0 — CLI (
datalasi validate,datalasi infer,datalasi diff) - 0.4.0 — Contract registry, breaking-change detection, migration scripts
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datalasi-0.1.0.tar.gz.
File metadata
- Download URL: datalasi-0.1.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1746cd9a1cdfcdc8d3aa4b3c687eb64fe04ca0bd3344e713efc916afe1608d14
|
|
| MD5 |
b3dcd0e18accca761b769c99308a3e8e
|
|
| BLAKE2b-256 |
f08491587500493750bbb80fb3e0082a435d42cb92f5ab8831419aab8accc4d8
|
Provenance
The following attestation bundles were made for datalasi-0.1.0.tar.gz:
Publisher:
publish.yml on Malodeity/datalasi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalasi-0.1.0.tar.gz -
Subject digest:
1746cd9a1cdfcdc8d3aa4b3c687eb64fe04ca0bd3344e713efc916afe1608d14 - Sigstore transparency entry: 1634027735
- Sigstore integration time:
-
Permalink:
Malodeity/datalasi@59ad6e415a68cde09cc43c303621f87f3bfc6153 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Malodeity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59ad6e415a68cde09cc43c303621f87f3bfc6153 -
Trigger Event:
release
-
Statement type:
File details
Details for the file datalasi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datalasi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8402036d2d74f465dd08b70e96eee77a9bb9e164e9b42b6829214b1b3161f30
|
|
| MD5 |
e2084967fd911526f915aa505ab1f9d9
|
|
| BLAKE2b-256 |
5cb39eedfb1c693f8ddc4a3d3a327e4c4ea0e7489422e5dd0afb6ceb0dd64699
|
Provenance
The following attestation bundles were made for datalasi-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Malodeity/datalasi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalasi-0.1.0-py3-none-any.whl -
Subject digest:
e8402036d2d74f465dd08b70e96eee77a9bb9e164e9b42b6829214b1b3161f30 - Sigstore transparency entry: 1634027743
- Sigstore integration time:
-
Permalink:
Malodeity/datalasi@59ad6e415a68cde09cc43c303621f87f3bfc6153 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Malodeity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59ad6e415a68cde09cc43c303621f87f3bfc6153 -
Trigger Event:
release
-
Statement type: