Skip to main content

Local Fabric table cache, lakehouse read/write, and deploy for modular Python on Microsoft Fabric

Project description

laken

The missing local development workflow for Microsoft Fabric.

laken lets you develop Python code for Fabric locally, using the tools you already trust.

Write code on your machine, run it against real Fabric lakehouse data.

When you're ready, laken deploy packages your project, publishes it to Fabric, and makes it available to your Fabric notebooks.

Your code stays modular. Your notebooks stay thin. And your local workflow survives contact with the platform.

Why “laken”?

Laken, pronounced LAH-kuhn, is Dutch for “cloth.” If you're feeling generous, it's a pun on Fabric and data lakes.


Installation

Install uv if needed, then add laken:

uv add laken
pip install laken

Deploy uses uv to build your wheel before publishing to a Fabric environment.


Quickstart

Write lakehouse code on your laptop against real Fabric data, package it, and run the same code in a notebook.

1. Credentials — create a .env in your project root (see Environment variables for the full list):

AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_NAME=MyWorkspace
FABRIC_LAKEHOUSE_NAME=MyLakehouse
FABRIC_WORKSPACE_ID=...
FABRIC_LAKEHOUSE_ID=...

2. Develop — reads pull from Fabric and cache locally. In a Fabric notebook that same code runs against your attached lakehouse:

from laken import Lakehouse

lh = Lakehouse()
df = lh.read_table("customers", frame_type="pandas")
# ...
lh.write_table(df, "customer_analytics")

3. Package and deploy — move that code into a normal Python package and publish it to a Fabric Environment (FABRIC_ENVIRONMENT_ID in .env):

customer_analytics/
├── pyproject.toml
└── src/customer_analytics/
    └── pipeline.py
# src/customer_analytics/pipeline.py
from laken import Lakehouse


def create_analytics(lh: Lakehouse) -> None:
    df = lh.read_table("customers", frame_type="pandas")
    # ...
    lh.write_table(df, "customer_analytics")
laken deploy

4. Run in a Fabric notebook — after the publish finishes:

from laken import Lakehouse
from customer_analytics.pipeline import create_analytics

lh = Lakehouse()
create_analytics(lh)

Usage

Lakehouse

Lakehouse() detects whether your code is running locally or in a Fabric notebook and connects accordingly. The same read_table / write_table calls work in both places:

  • Locally — the first read of a Fabric table copies it into a .laken/ folder on disk; later reads use that copy. Writes update only your local copy; they do not change tables in Fabric.
  • In a Fabric notebook — reads and writes go to your attached lakehouse.
from laken import Lakehouse

lh = Lakehouse()

Use schema.table when you need a schema (marketing.products). A bare name (products) is resolved by Fabric/Spark, usually as dbo.products on a schema-enabled lakehouse.

df = lh.read_table("products")                         # pandas locally; Spark in Fabric
df = lh.read_table("products", frame_type="spark")
df = lh.read_table("marketing.products", frame_type="polars")

lh.write_table(df, "products")
lh.write_table(df, "marketing.products", mode="append")

write_table replaces a table by default; pass mode="append" to add rows.

To use a different lakehouse than your .env or notebook default:

lh = Lakehouse(lakehouse="Sales_LH")

Fabric tables locally

The first time you read_table a Fabric table locally, laken downloads a copy into .laken/. Later reads use that copy.

write_table updates only that local copy — nothing is sent to Fabric. Run laken refresh <table> to discard local changes and download the table from Fabric again.

Tables up to 100 MB in Fabric are copied in full. Larger tables copy only the first 10,000 rows — enough to develop against without downloading the whole table. You can change both limits with max_mirror_mb and max_sample_rows on Lakehouse(...) or on a single read_table call:

lh = Lakehouse(max_mirror_mb=200, max_sample_rows=5_000)
lh.read_table("dbo.big_fact", max_mirror_mb=500)

CLI

laken deploy [--workspace-id <id>] [--environment-id <id>]
laken refresh <table>

laken deploy builds your project wheel from pyproject.toml, uploads it to a Fabric Environment, and starts a publish. Fabric rebuilds the environment in the background; import your package once that finishes.

laken refresh <table> replaces your local copy with the current table from Fabric. Use it when Fabric has newer data or when you want to undo local write_table changes. Tables you created locally that were never copied from Fabric are left alone.

Environment variables

When you create a Lakehouse or run a laken command, laken loads a .env file from your project root. Variables already set in your shell or CI take precedence. Call load_environment() yourself only if you need those values earlier.

Variable
AZURE_TENANT_ID — Azure AD tenant ID for your service principal
AZURE_CLIENT_ID — Application ID of the service principal
AZURE_CLIENT_SECRET — Client secret for the service principal
FABRIC_WORKSPACE_NAME — Fabric workspace name
FABRIC_LAKEHOUSE_NAME — Fabric Lakehouse name
FABRIC_WORKSPACE_ID — Fabric Workspace GUID
FABRIC_LAKEHOUSE_ID — Fabric Lakehouse GUID
FABRIC_ENVIRONMENT_ID — Fabric Environment GUID that laken deploy publishes to

AZURE_* values come from an Azure service principal.

In a Fabric notebook you can copy the Fabric variables from context:

import notebookutils

context = notebookutils.runtime.context

FABRIC_WORKSPACE_NAME = context['currentWorkspaceName']
FABRIC_LAKEHOUSE_NAME = context.get('defaultLakehouseName')
FABRIC_WORKSPACE_ID = context['currentWorkspaceId']
FABRIC_LAKEHOUSE_ID = context.get('defaultLakehouseId')
FABRIC_ENVIRONMENT_ID = context.get('environmentId')

print(f"FABRIC_WORKSPACE_NAME={FABRIC_WORKSPACE_NAME}")
print(f"FABRIC_LAKEHOUSE_NAME={FABRIC_LAKEHOUSE_NAME}")
print(f"FABRIC_WORKSPACE_ID={FABRIC_WORKSPACE_ID}")
print(f"FABRIC_LAKEHOUSE_ID={FABRIC_LAKEHOUSE_ID}")
print(f"FABRIC_ENVIRONMENT_ID={FABRIC_ENVIRONMENT_ID}")

Logging

laken logs to stderr when you use Lakehouse or the CLI. Default level is INFO. To see more detail:

import logging

logging.getLogger("laken").setLevel(logging.DEBUG)

Development

Contributions are welcome. To work on this package:

uv sync
uv run pytest
uv run ruff check

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laken-0.2.3.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

laken-0.2.3-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file laken-0.2.3.tar.gz.

File metadata

  • Download URL: laken-0.2.3.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for laken-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ce260be8586e31df948b627b9dab6c1f4b61f6899f18be196649aae91299da16
MD5 e7fcfeeafc43fb750a63a6e7e93b7ad8
BLAKE2b-256 a19761015fbbb30987957e5235dbccb9501c1ec296aa9a7fadd8d34e37d4e812

See more details on using hashes here.

File details

Details for the file laken-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: laken-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for laken-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 aebc420d642d70855b4f8f3ec4560e8cff785228d1990f1c7ff463ff68f0a609
MD5 f0ef437a8285003d19746d396d7e3b8a
BLAKE2b-256 7f46ce6490c170cc22e3dba6e5946b8d0a595036b52efeb5cffa2ba6478a0a28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page