Skip to main content

Local and Fabric lakehouse abstraction for modular, testable data code

Project description

laken

The missing local development workflow for Microsoft Fabric.

Python PyPI Microsoft Fabric


laken lets you develop Python code for Fabric locally, using the tools you already trust.

Write code on your machine, run it against real Fabric lakehouse data.

When you're ready, laken deploy packages your project, publishes it to Fabric, and makes it available to your Fabric notebooks.

Your code stays modular. Your notebooks stay thin. And your local workflow survives contact with the platform.

Why “laken”?

Laken, pronounced LAH-kuhn, is Dutch for “cloth.” If you're feeling generous, it's a pun on Fabric and data lakes.


Installation

Install uv if needed, then add laken:

uv add laken
pip install laken

Deploy uses uv to build your wheel before publishing to a Fabric environment.


Develop against your Fabric lakehouse

Set your credentials, select your workspace and lakehouse:

AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_NAME=MyWorkspace
FABRIC_LAKEHOUSE_NAME=MyLakehouse
FABRIC_WORKSPACE_ID=...
FABRIC_LAKEHOUSE_ID=...
from laken import Lakehouse

lh = Lakehouse()
products = lh.read_table("marketing.products", as_="pandas")

lh.write_table(products, "staging.products_snapshot")

Lakehouse detects when it is running locally and when it is running inside Fabric.

Locally, the first read_table for a Fabric table pulls from OneLake and caches it under .laken/ as Delta; later reads use the cache. In a Fabric notebook, the same code reads from your attached lakehouse.

Local writes stay under .laken/ and do not sync to Fabric; in Fabric, writes persist to tables on the attached lakehouse.


Deploy to Fabric

Structure your local code as a Python project using the standard src layout:

myapp/
├── pyproject.toml          # [project] name = "myapp"
├── src/
│   └── myapp/
│       ├── __init__.py
│       └── pipeline.py
└── .env

Add laken to your project dependencies.

See the Python packaging guide if you are setting this up for the first time.

# src/myapp/pipeline.py
import pandas as pd

from laken import Lakehouse


def run_pipeline(lh: Lakehouse) -> None:
    products = lh.read_table("marketing.products", as_="pandas")
    summary = products.groupby("category", as_index=False)["amount"].sum()
    lh.write_table(summary, "staging.product_summary")

When you are ready, laken deploy builds your package and loads it into your specified Fabric Environment.

Deploy credentials (.env or shell):

AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_ID=...
FABRIC_ENVIRONMENT_ID=...

From the repo root:

laken deploy

In a Fabric notebook:

from laken import Lakehouse
from myapp.pipeline import run_pipeline

lh = Lakehouse()
run_pipeline(lh)

Reference

Lakehouse

from laken import Lakehouse

lh = Lakehouse()

For tests or scripts that must pin a backend:

from laken import FabricLakehouse, LocalLakehouse

Tables — use schema.table to target a schema; a bare name is passed through to Spark and Fabric resolves it (typically the default dbo schema on a schema-enabled lakehouse). mode is "overwrite" or "append".

lh.write_table(df, "products")
lh.write_table(df, "marketing.products", mode="append")

df = lh.read_table("products")                    # Spark
df = lh.read_table("products", as_="pandas")
df = lh.read_table("marketing.products", as_="polars")

lh.list_tables()
lh.table_exists("marketing.products")
lh.drop_table("marketing.products")

Files — local paths under .laken/workspace/Files; in Fabric, under the lakehouse Files/ area.

lh.write_file(df, "exports/summary.parquet")
lh.read_file("exports/summary.parquet", as_="pandas")
lh.list_files("exports")
lh.file_exists("exports/summary.parquet")
lh.delete_file("exports/summary.parquet")

Warehouse tables — Spark synapsesql in Fabric; local parquet stand-in for tests.

lh.load_table_from_warehouse("SalesOrderHeader", "SalesWarehouse", as_="pandas")

Other lakehouses — defaults come from notebook context in Fabric; override locally or in notebooks:

lh = Lakehouse(lakehouse="Sales_LH")
lh.read_table("marketing.products", as_="pandas")

CLI

laken deploy [--workspace-id <id>] [--environment-id <id>]
laken status
laken refresh <table>
laken reset <table>

laken deploy builds the wheel from your repo's pyproject.toml, uploads it to a Fabric Environment, and publishes it so notebooks can import your package.

laken status, laken refresh, and laken reset manage the local .laken/ cache on your laptop. They do not run inside Fabric notebooks.

laken status lists cached tables with state (mirror, sample, or local), the Fabric source version when known, and notes such as staleness or sample size.

laken refresh <table> re-downloads a table from Fabric when it was originally cached from Fabric. Local-only tables are left unchanged.

laken reset <table> discards local changes and re-fetches from Fabric. The table must have been cached from Fabric first.

Environment variables

Variable Purpose
AZURE_TENANT_ID Auth (fetch + deploy)
AZURE_CLIENT_ID Auth (fetch + deploy)
AZURE_CLIENT_SECRET Auth (fetch + deploy)
FABRIC_WORKSPACE_NAME Local table fetch
FABRIC_LAKEHOUSE_NAME Local table fetch
FABRIC_WORKSPACE_ID OneLake paths; required for deploy
FABRIC_LAKEHOUSE_ID OneLake paths
FABRIC_ENVIRONMENT_ID Deploy target

AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET are credentials from an Azure service principal.

FABRIC_WORKSPACE_NAME, FABRIC_LAKEHOUSE_NAME, FABRIC_WORKSPACE_ID, FABRIC_LAKEHOUSE_ID, and FABRIC_ENVIRONMENT_ID can be read from a Fabric notebook with notebookutils:

import notebookutils

ctx = notebookutils.runtime.context
print(ctx.get("currentWorkspaceName"))
print(ctx.get("currentWorkspaceId"))
print(ctx.get("defaultLakehouseName"))
print(ctx.get("defaultLakehouseId"))
print(ctx.get("environmentId"))

Deploy expects pyproject.toml at the repo root, a buildable application wheel, and a Fabric environment with a compatible Python/Spark runtime.

Local vs Fabric

Class Where Storage Reads Writes
Lakehouse Auto-detects notebook context Fabric if available, else .laken/ Delta Local: Fabric → cache; Fabric: attached lakehouse Local: .laken/ only; Fabric: attached lakehouse
LocalLakehouse Laptop / CI .laken/workspace/ Cached Delta and local tables Local only; not pushed to Fabric
FabricLakehouse Fabric notebook Attached lakehouse Spark/Delta on attached lakehouse Delta tables on attached lakehouse

First local read of a Fabric table fetches and caches Delta under .laken/. If Fabric changes, laken warns and keeps the cache until you run laken refresh <table>. Large tables may cache as a fixed-size sample.


Development

Contributions are welcome. To work on this package:

uv sync
uv run pytest
uv run ruff check

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laken-0.1.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

laken-0.1.1-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file laken-0.1.1.tar.gz.

File metadata

  • Download URL: laken-0.1.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for laken-0.1.1.tar.gz
Algorithm Hash digest
SHA256 82c46214ef6bc173b815bdc3c6d7d518519af0ff6718a19d7679bb3c0ac667e4
MD5 6aded0046d96a791d311e84a51015cd2
BLAKE2b-256 00a09070ab7d7dcf0b95eaf434f2a09fac56bd95259d98e26a0e0872dd42adf9

See more details on using hashes here.

File details

Details for the file laken-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: laken-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for laken-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90cd6b449b038d9e9b85a2c0f7201972dbbed5229d25440620f965cdf0c702e7
MD5 1702ee6838d3458daff670a371d3e684
BLAKE2b-256 5bf2a780e4b737534181dbeb76f96b2df2269de87d080ab290d6b79695eb6f38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page