Skip to main content

A Python-based data contract runtime for consistent quality across engines.

Project description

LakeLogic

The Open-Source Runtime Engine for Data Contracts with Quarantine.

LakeLogic is a SQL-first, infrastructure-agnostic quality gate that ensures your business decisions are based on data you can trust. It scales your validation logic from local Polars to petabyte-scale Spark without rewriting a single rule.

Documentation GitHub License Python


The Core Value: Write Once. Run Anywhere

Stop paying the "Infrastructure Lock-In Tax." In a traditional stack, moving from a Warehouse (Snowflake) to a Lakehouse (Databricks) means months of rewriting validation rules. LakeLogic decouples your Business Logic from your Execution Engine.

  1. Cost Efficiency (The Spark Tax ROI): Run 80% of your maintenance checks on Polars or DuckDB for pennies, while reserving Spark for your massive production scales.
  2. Risk Mitigation (100% Reconciliation): Ensure Source = Good + Quarantined. Mathematically prove that no record was lost or double-counted across your layers.
  3. Stakeholder Trust (Visual Traceability): Use aggregate roll-ups to give your business users a visual drill-down from board-level KPIs back to raw source records.

Key Features

  • SQL-First Logic: Use the SQL expressions you already know for transformations and quality rules.
  • Schema Enforcement: Type casting, required fields, and unknown-field handling.
  • Intelligent Quarantine: Records that fail rules are detoured, tagged with error messages, and saved for correction.
  • Lineage Injection: Tag records with source path, run ID, and processing timestamp.
  • Materialization: Write validated data to local CSV/Parquet targets or Delta/Iceberg when running on Spark.
  • Referential Integrity: Validate keys against dimensions using local reference tables.
  • Notifications (Demo): Built-in adapters log alerts for quarantine and rule failures.
  • External Logic Hooks: Run dedicated Python modules or notebooks for advanced Gold processing.

Installation

# Get the full engine suite
uv pip install "lakelogic[all]"

# Or just use Polars for local speed
uv pip install "lakelogic[polars]"

# Profiling + PII detection (bootstrap)
uv pip install "lakelogic[profiling]"

See the full installation guide in docs/installation.md.

Quick Start

# 1. Run the Quality Gate (Automatic Engine Selection)
processor = DataProcessor(contract="silver_crm_customers.yaml")
source_df, good_df, bad_df = processor.run_source("bronze_crm_customers.csv")

# good_df -> Ready for Silver Layer
# bad_df  -> Sent to Quarantine

Get Started

📚 Read the Docs | 🚀 Quickstart Guide | 💬 Discussions

Run Your First Contract (5 Minutes)

# Clone the repo
git clone https://github.com/LineageLogic/LakeLogic.git
cd LakeLogic/examples/01_getting_started/basic_validation

# Run the example
lakelogic run --contract contract.yaml --source data/sample_customers.csv

You'll see:

  • ✅ Good records that passed validation
  • ❌ Quarantined records with error reasons
  • 📊 Quality metrics and health scores

Explore 90+ Examples

The examples/ directory contains runnable examples organized by skill level:

  • Getting Started - Your first contract in 5 minutes
  • Tutorials - Medallion architecture, reference joins, notifications
  • Patterns - Bronze quality gates, SCD2, deduplication, late-arriving data
  • Production - Complete insurance ELT pipeline with multi-entity contracts
  • Integrations - Airflow, Prefect, Dagster, Databricks job templates

Documentation

Contributing

See docs/installation.md#developer-installation to get started.


License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lakelogic-0.1.0b1.tar.gz (499.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lakelogic-0.1.0b1-py3-none-any.whl (105.0 kB view details)

Uploaded Python 3

File details

Details for the file lakelogic-0.1.0b1.tar.gz.

File metadata

  • Download URL: lakelogic-0.1.0b1.tar.gz
  • Upload date:
  • Size: 499.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lakelogic-0.1.0b1.tar.gz
Algorithm Hash digest
SHA256 f7fa54de1136b534904baca699741507107babc82cd5a66e77411de44be6d6af
MD5 1a5d18697befb697f95db7c43d04eb80
BLAKE2b-256 bf6944c4f0301573ee8e42da8da7876c64a310508e1cba795f134f11a215614d

See more details on using hashes here.

File details

Details for the file lakelogic-0.1.0b1-py3-none-any.whl.

File metadata

  • Download URL: lakelogic-0.1.0b1-py3-none-any.whl
  • Upload date:
  • Size: 105.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lakelogic-0.1.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 fa445c867d5610a4233a476fbddc1f9fb97305c5bceee025846422c0de7a4eab
MD5 1ce499d438ef5f9bd81f62cf0ce43080
BLAKE2b-256 40c6cc50e96281296e040a7e8aeb072cd4db8b3d1a07061c119542877bef4474

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page