Skip to main content

Git-backed data schema versioning and enforcement for Python

Project description

datalasi

Versioned data schema enforcement for Python.

Define data contracts as YAML files, version them in Git, and validate DataFrames against them. Think: pytest for your data schemas, with semantic versioning.

from datalasi import DataContract, Field, Int64, Float64, Enum
from datalasi.io import YAMLWriter, YAMLLoader

contract = DataContract(
    name="transactions",
    version="1.0.0",
    schema={
        "transaction_id": Field("transaction_id", Int64(), pk=True, nullable=False),
        "amount":         Field("amount", Float64(min=0.01), nullable=False),
        "status":         Field("status", Enum(["PENDING", "COMPLETED", "FAILED"]), nullable=False),
    },
    expectations=["amount > 0"],
    owner="data-eng@example.com",
)

# Save to YAML (Git-versioned)
YAMLWriter.write(contract, "contracts/transactions-v1.0.0.yaml")

# Load from YAML
loaded = YAMLLoader.load("contracts/transactions-v1.0.0.yaml")
assert loaded == contract

Installation

pip install datalasi

With optional adapters:

pip install "datalasi[pandas]"   # Pandas DataFrame validation
pip install "datalasi[polars]"   # Polars DataFrame validation
pip install "datalasi[all]"      # All adapters

Core Concepts

DataContract

A contract describes the expected structure of a dataset:

  • name — unique identifier (e.g. transactions)
  • version — semantic version (MAJOR.MINOR.PATCH)
  • schema — column definitions with types, nullability, constraints
  • expectations — data-quality rules (stored as strings, evaluated by adapters)
  • breaking_changesFAIL, WARN, or IGNORE

Field

Each field carries a DataType plus metadata:

Field("amount", Float64(min=0.01, max=1_000_000), nullable=False, description="USD amount")

Supported Types

Type Description Constraints
Int64 64-bit integer min, max
Int32 32-bit integer min, max
Float64 64-bit float min, max
String Text max_length, pattern
Boolean True/False
Date YYYY-MM-DD string
Timestamp ISO datetime string timezone
Enum Fixed value set allowed_values

YAML Contract Format

name: transactions
version: 1.0.0
owner: data-eng@company.com
breaking_changes: FAIL

schema:
  transaction_id:
    type: Int64
    nullable: false
    pk: true

  amount:
    type: Float64
    nullable: false
    min: 0.01
    max: 1000000

  status:
    type: Enum
    allowed_values: [PENDING, COMPLETED, FAILED]
    nullable: false

expectations:
  - "amount > 0"

Schema Evolution

v2 = v1.evolve(
    version="1.1.0",
    schema_additions={"currency": Field("currency", String(), description="ISO 4217 code")},
)

Development

git clone https://github.com/malodeity/datalasi
cd datalasi
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest tests/unit/ -v --cov=datalasi

# Lint & format
ruff check datalasi tests
black datalasi tests
mypy datalasi

Building & Publishing to PyPI

pip install build twine

# Build distribution
python -m build

# Upload to PyPI
twine upload dist/*

Roadmap

  • 0.1.0 — Core type system, YAML I/O, contract model ✓
  • 0.2.0 — Pandas & Polars adapters, DataFrame validation
  • 0.3.0 — CLI (datalasi validate, datalasi infer, datalasi diff)
  • 0.4.0 — Contract registry, breaking-change detection, migration scripts

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalasi-0.1.0.tar.gz (34.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalasi-0.1.0-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file datalasi-0.1.0.tar.gz.

File metadata

  • Download URL: datalasi-0.1.0.tar.gz
  • Upload date:
  • Size: 34.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalasi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1746cd9a1cdfcdc8d3aa4b3c687eb64fe04ca0bd3344e713efc916afe1608d14
MD5 b3dcd0e18accca761b769c99308a3e8e
BLAKE2b-256 f08491587500493750bbb80fb3e0082a435d42cb92f5ab8831419aab8accc4d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalasi-0.1.0.tar.gz:

Publisher: publish.yml on Malodeity/datalasi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datalasi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datalasi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalasi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e8402036d2d74f465dd08b70e96eee77a9bb9e164e9b42b6829214b1b3161f30
MD5 e2084967fd911526f915aa505ab1f9d9
BLAKE2b-256 5cb39eedfb1c693f8ddc4a3d3a327e4c4ea0e7489422e5dd0afb6ceb0dd64699

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalasi-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Malodeity/datalasi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page