A Python-based data contract runtime for consistent quality across engines.

These details have not been verified by PyPI

Project description

LakeLogic

Trust Your Data. Scale Your Logic.

Write Once. Run Anywhere. — The open-source runtime for data contracts with quarantine.

LakeLogic is a SQL-first, infrastructure-agnostic quality gate that ensures your business decisions are based on data you can trust. It scales your validation logic from local Polars to petabyte-scale Spark without rewriting a single rule.

The Core Value: Write Once. Run Anywhere

Stop paying the "Infrastructure Lock-In Tax." In a traditional stack, moving from a Warehouse (Snowflake) to a Lakehouse (Databricks) means months of rewriting validation rules. LakeLogic decouples your Business Logic from your Execution Engine.

Cost Efficiency (The Spark Tax ROI): Run 80% of your maintenance checks on Polars or DuckDB for pennies, while reserving Spark for your massive production scales.
Risk Mitigation (100% Reconciliation): Ensure Source = Good + Quarantined. Mathematically prove that no record was lost or double-counted across your layers.
Stakeholder Trust (Visual Traceability): Use aggregate roll-ups to give your business users a visual drill-down from board-level KPIs back to raw source records.

Key Features

SQL-First Logic: Use the SQL expressions you already know for transformations and quality rules.
Schema Enforcement: Type casting, required fields, and unknown-field handling.
Intelligent Quarantine: Records that fail rules are detoured, tagged with error messages, and saved for correction.
Lineage Injection: Tag records with source path, run ID, and processing timestamp.
Materialization: Write validated data to local CSV/Parquet targets or Delta/Iceberg when running on Spark.
Referential Integrity: Validate keys against dimensions using local reference tables.
Contract Inference: Auto-generate contracts from landing-zone files with lakelogic bootstrap.
dbt Import: Convert dbt schema.yml / sources.yml into LakeLogic contracts with lakelogic import-dbt.
Synthetic Data Generation: Generate realistic test data from any contract with DataGenerator.
External Logic Hooks: Run dedicated Python modules or notebooks for advanced Gold processing.
Policy Packs: Apply standardised rule sets and defaults across all contracts.
Notifications: Built-in adapters log alerts for quarantine and rule failures.
Observability: Prometheus metrics endpoint, summary tables, and execution tracing.
Delta Lake Support (Spark-Free): Read/write/merge Delta tables with Polars, DuckDB, or Pandas — no Spark required.
Catalog Table Names: Use Unity Catalog, Fabric LakeDB, and Synapse table names (catalog.schema.table) directly.
Streaming Ingestion: Kafka, WebSocket, SSE, Azure Service Bus, GCP Pub/Sub, AWS SQS.
Database CDC: Azure SQL, PostgreSQL, MySQL, MongoDB, Oracle, SQL Server change capture.

Installation

# Get the full engine suite
uv pip install "lakelogic[all]"

# Or just use Polars for local speed
uv pip install "lakelogic[polars]"

# Delta Lake support (Spark-free)
uv pip install "lakelogic[delta]"

# Profiling + PII detection (bootstrap)
uv pip install "lakelogic[profiling]"

# Database CDC connectors
uv pip install "lakelogic[databases]"

# Streaming sources
uv pip install "lakelogic[streaming]"

See the full installation guide in docs/installation.md.

Quick Start

from lakelogic import DataProcessor

# 1. Run the Quality Gate (Automatic Engine Selection)
processor = DataProcessor(contract="silver_crm_customers.yaml")
good_df, bad_df = processor.run_source()

# good_df -> Ready for Silver Layer
# bad_df  -> Sent to Quarantine

run_source() automatically reads the source path from your contract. You can also pass an explicit path:

good_df, bad_df = processor.run_source("bronze_crm_customers.csv")

The return value is a ValidationResult that unpacks as two DataFrames. Access the raw (pre-validation) frame via result.raw:

result = processor.run_source()
print(f"Total: {len(result.raw)} | Valid: {len(result.good)} | Quarantined: {len(result.bad)}")

Delta Lake & Catalog Support (Spark-Free!)

Unity Catalog (Databricks)

from lakelogic import DataProcessor

# Use Unity Catalog table names directly (no Spark required!)
processor = DataProcessor(engine="polars", contract="contracts/customers.yaml")
good_df, bad_df = processor.run_source("main.default.customers")

# LakeLogic automatically:
# 1. Resolves table name to storage path
# 2. Uses Delta-RS for fast, Spark-free operations
# 3. Validates data with your contract rules

Fabric LakeDB (Microsoft)

processor = DataProcessor(engine="polars", contract="contracts/sales.yaml")
good_df, bad_df = processor.run_source("myworkspace.sales_lakehouse.customers")

Synapse Analytics (Azure)

processor = DataProcessor(engine="polars", contract="contracts/sales.yaml")
good_df, bad_df = processor.run_source("salesdb.dbo.customers")

Learn more: Delta Lake Support | Catalog Table Names

dbt Integration

Import existing dbt projects directly — no rewrite needed:

# Convert a dbt model to a LakeLogic contract
lakelogic import-dbt --schema models/schema.yml --model customers --output contracts/

# Or use the Python API
from lakelogic import DataProcessor
proc = DataProcessor.from_dbt("models/schema.yml", model="customers")
good_df, bad_df = proc.run_source()

Get Started

📚 Read the Docs | 🚀 Quickstart Guide | 💬 Discussions

Run Your First Contract (5 Minutes)

# Clone the repo
git clone https://github.com/LineageLogic/LakeLogic.git
cd LakeLogic/examples/01_quickstart

# Run the example
lakelogic run --contract users_contract.yaml --source data/sample_customers.csv

You'll see:

✅ Good records that passed validation
❌ Quarantined records with error reasons
📊 Quality metrics and health scores

Explore the Examples

The examples/ directory contains 24 runnable notebooks across 4 tested categories:

Category	Directory	What You'll Learn
Quickstart	`01_quickstart/`	Your first contract in 5 minutes, database governance, dbt+PII
Core Patterns	`02_core_patterns/`	Medallion architecture, bronze quality gates, SCD2, deduplication, reference joins, soft deletes
Advanced Workflows	`03_advanced_workflows/`	Insurance ELT pipeline, GDPR compliance, late-arriving data, external Python logic, environment promotion, bootstrap, date dimensions, multi-tenant isolation, partitioned merge, payments lifecycle, streaming, synthetic data generation
Compliance	`04_compliance_governance/`	HIPAA PII masking

Looking for more? Additional examples for data sources, cloud platforms, orchestration, and production patterns are in examples/_archive/. These are functional but not yet fully tested.

Documentation

Full Documentation — Complete guides and API reference
How It Works — Medallion architecture and core concepts
CLI Reference — Command-line usage
API Reference — Python API documentation
Reprocessing Guide — Handle late-arriving data
Contract Template — Full YAML reference for all contract fields
Streaming — Real-time ingestion guide

Contributing

See CONTRIBUTING.md to get started, or docs/installation.md#developer-installation for environment setup.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.18.0

Apr 16, 2026

1.17.1

Apr 15, 2026

1.17.0

Apr 15, 2026

1.16.0

Apr 15, 2026

1.15.0

Apr 15, 2026

1.13.0

Apr 14, 2026

1.12.0

Apr 7, 2026

1.10.0

Mar 30, 2026

1.9.0

Mar 27, 2026

1.8.0

Mar 27, 2026

1.7.0

Mar 27, 2026

1.6.0

Mar 26, 2026

1.5.0

Mar 24, 2026

1.2.0

Mar 8, 2026

1.1.0

Mar 7, 2026

0.14.0

Mar 4, 2026

0.13.0

Mar 4, 2026

0.12.0

Mar 3, 2026

0.10.0

Mar 1, 2026

0.9.0

Mar 1, 2026

0.8.0

Feb 28, 2026

0.5.0

Feb 28, 2026

0.2.0

Feb 27, 2026

This version

0.2.0b0 pre-release

Feb 26, 2026

0.1.0b2 pre-release

Feb 14, 2026

0.1.0b1 pre-release

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lakelogic-0.2.0b0.tar.gz (796.1 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lakelogic-0.2.0b0-py3-none-any.whl (268.0 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file lakelogic-0.2.0b0.tar.gz.

File metadata

Download URL: lakelogic-0.2.0b0.tar.gz
Upload date: Feb 26, 2026
Size: 796.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lakelogic-0.2.0b0.tar.gz
Algorithm	Hash digest
SHA256	`53832e0eb91621e6a7f28a5d9004a8153be8d970a31df946b559c668baa06f7d`
MD5	`8223254d4b40771f1dc1bb3d945c2c22`
BLAKE2b-256	`4c4b01f86ad85e2bb0bb2cd7512899e843a1c76e4fdd504c651be3fa83e57562`

See more details on using hashes here.

File details

Details for the file lakelogic-0.2.0b0-py3-none-any.whl.

File metadata

Download URL: lakelogic-0.2.0b0-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 268.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lakelogic-0.2.0b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7afee103bfc73c8e73935548631a4ddbfea0dcb5ee2c01730c8fb84e0533910a`
MD5	`3d580e1172823b37f0e41e8b456993de`
BLAKE2b-256	`b51fd55ecd57ea55ec933bc9d3060cf0e4ba94155745412cbeac5f420f3a71e5`

See more details on using hashes here.

lakelogic 0.2.0b0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LakeLogic

The Core Value: Write Once. Run Anywhere

Key Features

Installation

Quick Start

Delta Lake & Catalog Support (Spark-Free!)

Unity Catalog (Databricks)

Fabric LakeDB (Microsoft)

Synapse Analytics (Azure)

dbt Integration

Get Started

Run Your First Contract (5 Minutes)

Explore the Examples

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes