Local Fabric table cache, lakehouse read/write, and deploy for modular Python on Microsoft Fabric
Project description
The missing local development workflow for Microsoft Fabric.
laken lets you develop Python code for Fabric locally, using the tools you already trust.
Write code on your machine, run it against real Fabric lakehouse data.
When you're ready, laken deploy packages your project, publishes it to Fabric, and makes it
available to your Fabric notebooks.
Your code stays modular. Your notebooks stay thin. And your local workflow survives contact with the platform.
Why “laken”?
Laken, pronounced LAH-kuhn, is Dutch for “cloth.” If you're feeling generous, it's a pun on Fabric and data lakes.
Installation
Install uv if needed, then add
laken:
uv add laken
pip install laken
Deploy uses uv to build your wheel before publishing to a Fabric environment.
Quickstart
Write lakehouse code on your laptop against real Fabric data, package it, and run the same code in a notebook.
1. Credentials — create a .env in your project root (see
Environment variables for the full list):
AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_NAME=MyWorkspace
FABRIC_LAKEHOUSE_NAME=MyLakehouse
FABRIC_WORKSPACE_ID=...
FABRIC_LAKEHOUSE_ID=...
2. Develop — reads pull from Fabric and cache locally. In a Fabric notebook that same code runs against your attached lakehouse:
from laken import Lakehouse
lh = Lakehouse()
df = lh.read_table("customers", frame_type="pandas")
# ...
lh.write_table(df, "customer_analytics")
3. Package and deploy — move that code into a normal Python package and publish it to
a Fabric Environment (FABRIC_ENVIRONMENT_ID in .env):
customer_analytics/
├── pyproject.toml
└── src/customer_analytics/
└── pipeline.py
# src/customer_analytics/pipeline.py
from laken import Lakehouse
def create_analytics(lh: Lakehouse) -> None:
df = lh.read_table("customers", frame_type="pandas")
# ...
lh.write_table(df, "customer_analytics")
laken deploy
4. Run in a Fabric notebook — after the publish finishes:
from laken import Lakehouse
from customer_analytics.pipeline import create_analytics
lh = Lakehouse()
create_analytics(lh)
Usage
Lakehouse
Lakehouse() detects whether your code is running locally or in a Fabric notebook and
connects accordingly. The same read_table / write_table calls work in both places:
- Locally — the first read of a Fabric table copies it into a
.laken/folder on disk; later reads use that copy. Writes update only your local copy; they do not change tables in Fabric. - In a Fabric notebook — reads and writes go to your attached lakehouse.
from laken import Lakehouse
lh = Lakehouse()
Use schema.table when you need a schema (marketing.products). A bare name
(products) is resolved by Fabric/Spark, usually as dbo.products on a schema-enabled
lakehouse.
df = lh.read_table("products") # pandas locally; Spark in Fabric
df = lh.read_table("products", frame_type="spark")
df = lh.read_table("marketing.products", frame_type="polars")
lh.write_table(df, "products")
lh.write_table(df, "marketing.products", mode="append")
write_table replaces a table by default; pass mode="append" to add rows.
To use a different lakehouse than your .env or notebook default:
lh = Lakehouse(lakehouse="Sales_LH")
Fabric tables locally
The first time you read_table a Fabric table locally, laken downloads a copy into
.laken/. Later reads use that copy.
write_table updates only that local copy — nothing is sent to Fabric. Run
laken refresh <table> to discard local changes and download the table from Fabric again.
Tables up to 100 MB in Fabric are copied in full. Larger tables copy only the first
10,000 rows — enough to develop against without downloading the whole table. You can
change both limits with max_mirror_mb and max_sample_rows on Lakehouse(...) or on a
single read_table call:
lh = Lakehouse(max_mirror_mb=200, max_sample_rows=5_000)
lh.read_table("dbo.big_fact", max_mirror_mb=500)
CLI
laken deploy [--workspace-id <id>] [--environment-id <id>]
laken refresh <table>
laken deploy builds your project wheel from pyproject.toml, uploads it to a Fabric
Environment, and starts a publish. Fabric rebuilds the environment in the background;
import your package once that finishes.
laken refresh <table> replaces your local copy with the current table from Fabric. Use
it when Fabric has newer data or when you want to undo local write_table changes.
Tables you created locally that were never copied from Fabric are left alone.
Environment variables
When you create a Lakehouse or run a laken command, laken loads a .env file from
your project root. Variables already set in your shell or CI take precedence. Call
load_environment() yourself only if you need those values earlier.
| Variable | |
|---|---|
AZURE_TENANT_ID |
Azure AD tenant ID for your service principal |
AZURE_CLIENT_ID |
Application (client) ID of the service principal |
AZURE_CLIENT_SECRET |
Client secret for the service principal |
FABRIC_WORKSPACE_NAME |
Fabric workspace display name (required locally, with the other three name/ID vars) |
FABRIC_LAKEHOUSE_NAME |
Lakehouse display name to read from locally |
FABRIC_WORKSPACE_ID |
Workspace GUID for OneLake paths and deploy |
FABRIC_LAKEHOUSE_ID |
Lakehouse GUID for OneLake paths when reading locally |
FABRIC_ENVIRONMENT_ID |
Fabric Environment that laken deploy publishes to |
AZURE_* values come from an Azure service principal. In a Fabric notebook you can copy
the Fabric variables from context:
import notebookutils
context = notebookutils.runtime.context
FABRIC_WORKSPACE_NAME = context['currentWorkspaceName']
FABRIC_LAKEHOUSE_NAME = context.get('defaultLakehouseName')
FABRIC_WORKSPACE_ID = context['currentWorkspaceId']
FABRIC_LAKEHOUSE_ID = context.get('defaultLakehouseId')
FABRIC_ENVIRONMENT_ID = context.get('environmentId')
print(f"FABRIC_WORKSPACE_NAME={FABRIC_WORKSPACE_NAME}")
print(f"FABRIC_LAKEHOUSE_NAME={FABRIC_LAKEHOUSE_NAME}")
print(f"FABRIC_WORKSPACE_ID={FABRIC_WORKSPACE_ID}")
print(f"FABRIC_LAKEHOUSE_ID={FABRIC_LAKEHOUSE_ID}")
print(f"FABRIC_ENVIRONMENT_ID={FABRIC_ENVIRONMENT_ID}")
Logging
laken logs to stderr when you use Lakehouse or the CLI. Default level is INFO. To see
more detail:
import logging
logging.getLogger("laken").setLevel(logging.DEBUG)
Development
Contributions are welcome. To work on this package:
uv sync
uv run pytest
uv run ruff check
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file laken-0.2.2.tar.gz.
File metadata
- Download URL: laken-0.2.2.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08c23a2210aa147bdbde1f14d794b007df05d94735b29bfad940f92c0744030a
|
|
| MD5 |
10aa1702fb937a440729850616c634c1
|
|
| BLAKE2b-256 |
65587b1b8dcb4285fef12f87e4b1a725614013996bd4807856e99b5f9bc97880
|
File details
Details for the file laken-0.2.2-py3-none-any.whl.
File metadata
- Download URL: laken-0.2.2-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e133422e93dab7558eaf931f084c32cd1436e6883b845ae2e27f728ab49e7a1b
|
|
| MD5 |
332a72950e7586140d47404ea6354e8d
|
|
| BLAKE2b-256 |
ed313ed94fbbb27fea4653f22d27d7f3ed57974ef8dd445a9eae939085253866
|