Skip to main content

Local Fabric table cache, lakehouse read/write, and deploy for modular Python on Microsoft Fabric

Project description

laken

The missing local development workflow for Microsoft Fabric.

laken lets you develop Python code for Fabric locally, using the tools you already trust.

Write code on your machine, run it against real Fabric lakehouse data.

When you're ready, laken deploy packages your project, publishes it to Fabric, and makes it available to your Fabric notebooks.

Your code stays modular. Your notebooks stay thin. And your local workflow survives contact with the platform.

Why “laken”?

Laken, pronounced LAH-kuhn, is Dutch for “cloth.” If you're feeling generous, it's a pun on Fabric and data lakes.


Installation

Install uv if needed, then add laken:

uv add laken
pip install laken

Deploy uses uv to build your wheel before publishing to a Fabric environment.


Quickstart

Write lakehouse code on your laptop against real Fabric data, package it, and run the same code in a notebook.

1. Credentials — create a .env in your project root (see Environment variables for the full list):

AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_NAME=MyWorkspace
FABRIC_LAKEHOUSE_NAME=MyLakehouse
FABRIC_WORKSPACE_ID=...
FABRIC_LAKEHOUSE_ID=...

2. Develop — reads pull from Fabric and cache locally. In a Fabric notebook that same code runs against your attached lakehouse:

from laken import Lakehouse

lh = Lakehouse()
df = lh.read_table("customers", frame_type="pandas")
# ...
lh.write_table(df, "customer_analytics")

3. Package and deploy — move that code into a normal Python package and publish it to a Fabric Environment (FABRIC_ENVIRONMENT_ID in .env):

customer_analytics/
├── pyproject.toml
└── src/customer_analytics/
    └── pipeline.py
# src/customer_analytics/pipeline.py
from laken import Lakehouse


def create_analytics(lh: Lakehouse) -> None:
    df = lh.read_table("customers", frame_type="pandas")
    # ...
    lh.write_table(df, "customer_analytics")
laken deploy

4. Run in a Fabric notebook — after the publish finishes:

from laken import Lakehouse
from customer_analytics.pipeline import create_analytics

lh = Lakehouse()
create_analytics(lh)

Usage

Lakehouse

Lakehouse() detects whether your code is running locally or in a Fabric notebook and connects accordingly. The same read_table / write_table calls work in both places:

  • Locally — the first read of a Fabric table copies it into a .laken/ folder on disk; later reads use that copy. Writes update only your local copy; they do not change tables in Fabric.
  • In a Fabric notebook — reads and writes go to your attached lakehouse.
from laken import Lakehouse

lh = Lakehouse()

Use schema.table when you need a schema (marketing.products). A bare name (products) is resolved by Fabric/Spark, usually as dbo.products on a schema-enabled lakehouse.

df = lh.read_table("products")                         # pandas locally; Spark in Fabric
df = lh.read_table("products", frame_type="spark")
df = lh.read_table("marketing.products", frame_type="polars")

lh.write_table(df, "products")
lh.write_table(df, "marketing.products", mode="append")

write_table replaces a table by default; pass mode="append" to add rows.

To use a different lakehouse than your .env or notebook default:

lh = Lakehouse(lakehouse="Sales_LH")

Fabric tables locally

The first time you read_table a Fabric table locally, laken downloads a copy into .laken/. Later reads use that copy.

write_table updates only that local copy — nothing is sent to Fabric. Run laken refresh <table> to discard local changes and download the table from Fabric again.

Tables up to 100 MB in Fabric are copied in full. Larger tables copy only the first 10,000 rows — enough to develop against without downloading the whole table. You can change both limits with max_mirror_mb and max_sample_rows on Lakehouse(...) or on a single read_table call:

lh = Lakehouse(max_mirror_mb=200, max_sample_rows=5_000)
lh.read_table("dbo.big_fact", max_mirror_mb=500)

CLI

laken deploy [--workspace-id <id>] [--environment-id <id>]
laken refresh <table>

laken deploy builds your project wheel from pyproject.toml, uploads it to a Fabric Environment, and starts a publish. Fabric rebuilds the environment in the background; import your package once that finishes.

laken refresh <table> replaces your local copy with the current table from Fabric. Use it when Fabric has newer data or when you want to undo local write_table changes. Tables you created locally that were never copied from Fabric are left alone.

Environment variables

When you create a Lakehouse or run a laken command, laken loads a .env file from your project root. Variables already set in your shell or CI take precedence. Call load_environment() yourself only if you need those values earlier.

Variable
AZURE_TENANT_ID Azure AD tenant ID for your service principal
AZURE_CLIENT_ID Application (client) ID of the service principal
AZURE_CLIENT_SECRET Client secret for the service principal
FABRIC_WORKSPACE_NAME Fabric workspace display name (required locally, with the other three name/ID vars)
FABRIC_LAKEHOUSE_NAME Lakehouse display name to read from locally
FABRIC_WORKSPACE_ID Workspace GUID for OneLake paths and deploy
FABRIC_LAKEHOUSE_ID Lakehouse GUID for OneLake paths when reading locally
FABRIC_ENVIRONMENT_ID Fabric Environment that laken deploy publishes to

AZURE_* values come from an Azure service principal. In a Fabric notebook you can copy the Fabric variables from context:

import notebookutils

context = notebookutils.runtime.context

FABRIC_WORKSPACE_NAME = context['currentWorkspaceName']
FABRIC_LAKEHOUSE_NAME = context.get('defaultLakehouseName')
FABRIC_WORKSPACE_ID = context['currentWorkspaceId']
FABRIC_LAKEHOUSE_ID = context.get('defaultLakehouseId')
FABRIC_ENVIRONMENT_ID = context.get('environmentId')

print(f"FABRIC_WORKSPACE_NAME={FABRIC_WORKSPACE_NAME}")
print(f"FABRIC_LAKEHOUSE_NAME={FABRIC_LAKEHOUSE_NAME}")
print(f"FABRIC_WORKSPACE_ID={FABRIC_WORKSPACE_ID}")
print(f"FABRIC_LAKEHOUSE_ID={FABRIC_LAKEHOUSE_ID}")
print(f"FABRIC_ENVIRONMENT_ID={FABRIC_ENVIRONMENT_ID}")

Logging

laken logs to stderr when you use Lakehouse or the CLI. Default level is INFO. To see more detail:

import logging

logging.getLogger("laken").setLevel(logging.DEBUG)

Development

Contributions are welcome. To work on this package:

uv sync
uv run pytest
uv run ruff check

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laken-0.2.2.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

laken-0.2.2-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file laken-0.2.2.tar.gz.

File metadata

  • Download URL: laken-0.2.2.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for laken-0.2.2.tar.gz
Algorithm Hash digest
SHA256 08c23a2210aa147bdbde1f14d794b007df05d94735b29bfad940f92c0744030a
MD5 10aa1702fb937a440729850616c634c1
BLAKE2b-256 65587b1b8dcb4285fef12f87e4b1a725614013996bd4807856e99b5f9bc97880

See more details on using hashes here.

File details

Details for the file laken-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: laken-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for laken-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e133422e93dab7558eaf931f084c32cd1436e6883b845ae2e27f728ab49e7a1b
MD5 332a72950e7586140d47404ea6354e8d
BLAKE2b-256 ed313ed94fbbb27fea4653f22d27d7f3ed57974ef8dd445a9eae939085253866

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page