A Python-based data contract runtime for consistent quality across engines.
Project description
LakeLogic
Your data pipeline breaks silently. LakeLogic catches it.
One YAML contract. Any engine. Every row validated, quarantined, or promoted — automatically.
The Problem
You write quality checks in Spark. Then you need to run locally with Polars. Now you're maintaining two codebases. Your bronze layer has no validation. Your silver layer silently drops rows. Nobody knows which records failed or why.
The Solution
# contract.yaml — this is your entire quality gate
version: "1.0"
info:
title: Silver Customers
owner: data-team
model:
fields:
- name: customer_id
type: integer
required: true
- name: email
type: string
- name: revenue
type: float
- name: status
type: string
source:
type: landing
path: "data/customers/*.csv"
load_mode: incremental
quality:
row_rules:
- sql: "customer_id IS NOT NULL AND email IS NOT NULL"
- sql: "status IN ('active', 'churned', 'pending')"
- sql: "revenue >= 0"
- sql: "email LIKE '%@%.%'"
materialization:
strategy: merge
target_path: "silver/customers"
format: parquet
merge_keys: [customer_id]
quarantine:
enabled: true
target: "quarantine/customers"
from lakelogic import DataProcessor
result = DataProcessor("contract.yaml").run_source()
print(f"✅ Valid: {len(result.good)} | ❌ Quarantined: {len(result.bad)}")
Same contract runs on Polars, Spark, DuckDB, or Pandas. Zero code changes.
Install
pip install lakelogic # Core + Polars
pip install "lakelogic[spark]" # + PySpark
pip install "lakelogic[delta]" # + Delta Lake (Spark-free)
pip install "lakelogic[notifications]" # + Apprise + Jinja2 alerts
pip install "lakelogic[all]" # Everything
What You Get
🔒 Schema & Quality Gate
Define fields, types, required constraints, and SQL-based rules in YAML. Bad rows are quarantined with tagged error reasons — never silently dropped.
🔄 Engine Portability
One contract, four engines. Develop locally on Polars in milliseconds. Deploy to Spark at scale. Same validation semantics everywhere.
📊 Declarative Transformations
Rename, derive, deduplicate, pivot, unpivot, bucket, join, filter, JSON extract, date range explode — all in YAML, all engine-agnostic.
🔗 Automatic Lineage
Every row is stamped with _lakelogic_source, _lakelogic_processed_at, and _lakelogic_run_id. Upstream lineage columns are preserved with _upstream_* prefix across layers.
📦 Incremental Processing
Watermark-based incremental loads, file-mtime tracking, run logs, and CDC support. Process only what's new.
🔔 Notifications
Slack, Teams, Email, Discord, and 90+ channels via Apprise. Built-in Jinja2 templates per event. Just add a target URL.
🏗️ Materialization
Write validated data to CSV, Parquet, Delta Lake, or Unity Catalog tables. Supports append, overwrite, merge, and SCD2 strategies.
🧪 Synthetic Data
Generate realistic test data from any contract: lakelogic generate --contract contract.yaml --rows 1000
🔌 dbt Import
Already using dbt? Convert your schema.yml in one command: lakelogic import-dbt --schema models/schema.yml --output contracts/
Quick Start (5 Minutes)
1. Bootstrap a contract from your data
lakelogic bootstrap --landing data/ --output contracts/
This scans your files, infers schemas, detects PII, and generates ready-to-use contracts.
2. Run the quality gate
lakelogic run --contract contracts/customers.yaml --source data/customers.csv
3. See the results
✅ Good records: 847 → output/customers_good.parquet
❌ Quarantined: 23 → output/customers_quarantine.parquet
📊 Quality score: 97.4%
4. Check your environment
lakelogic doctor
LakeLogic Doctor
═══════════════════════════════════════
Version : 0.2.0
Python : 3.11.7
OS : Windows 11
Engines
───────
✅ polars 1.18.0
✅ duckdb 1.1.3
✅ pandas 2.2.1
⬚ pyspark not installed
Extras
──────
✅ deltalake 0.22.3
✅ jinja2 3.1.4
✅ apprise 1.9.0
⬚ dataprofiler not installed
═══════════════════════════════════════
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Contract YAML │
│ schema · SQL quality rules · transforms · lineage · target │
└────────────────────────────┬─────────────────────────────────────┘
│
┌──────▼──────┐
│ DataProcessor│
└──────┬──────┘
│
┌────────────┬───────┼───────┬────────────┐
▼ ▼ ▼ ▼ │
┌────────┐ ┌────────┐ ┌───────┐ ┌────────┐ │
│ Polars │ │ Spark │ │DuckDB │ │ Pandas │ │
└───┬────┘ └───┬────┘ └──┬────┘ └───┬────┘ │
│ │ │ │ │
└───────────┴────┬────┴──────────┘ │
│ │
┌────────▼────────┐ │
│ Validated Data │ │
│ ┌────┐ ┌─────┐ │ │
│ │Good│ │ Bad │ │ │
│ └──┬─┘ └──┬──┘ │ │
└─────┼──────┼────┘ │
│ │ │
┌─────▼┐ ┌──▼────────┐ │
│Target│ │Quarantine │ │
└──────┘ └───────────┘ │
Explore the Examples
The examples/ directory contains runnable notebooks across three learning tracks:
| Folder | What You'll Learn |
|---|---|
01_quickstart/ |
Remote CSV ingestion, database governance, dbt + PII quality |
02_core_patterns/ |
Bronze quality gate, medallion architecture, SCD2, deduplication, reference joins, soft deletes |
03_compliance_governance/ |
HIPAA & GDPR Policy Packs, automated PII masking, audit-ready quarantine |
Documentation
- Full Docs — Complete guides and API reference
- Quickstart — Get running in 5 minutes
- Contract Reference — Full YAML field reference
- CLI Reference — Command-line usage
Contributing
See CONTRIBUTING.md to get started, or docs/installation.md#developer-installation for environment setup.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lakelogic-0.9.0.tar.gz.
File metadata
- Download URL: lakelogic-0.9.0.tar.gz
- Upload date:
- Size: 932.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69d9fb2ca7813ba8e2bf6ca9f94acc4e49368cf6dadc82965b71ef177e4df803
|
|
| MD5 |
2044c7cb1b15724902b5a910ee0400ea
|
|
| BLAKE2b-256 |
b6300ab850308a0d76e46bc3680bb48ba8d3b7d9ae6edfd0a90a497b53b6bdcb
|
File details
Details for the file lakelogic-0.9.0-py3-none-any.whl.
File metadata
- Download URL: lakelogic-0.9.0-py3-none-any.whl
- Upload date:
- Size: 296.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1942bee3a52c3f563c7b95639876ef3d7078af702184f52f2ab24e0ca95caaaf
|
|
| MD5 |
442d5409e27a4a1572672b3f8db30095
|
|
| BLAKE2b-256 |
64299e41f611bd6ec6862ec46bc12d18b3235816322910a51ba93e2c152d4e95
|