Local and Fabric lakehouse abstraction for modular, testable data code
Project description
laken lets you develop Python code for Fabric locally, using the tools you already trust.
Write code on your machine, run it against real Fabric lakehouse data.
When you're ready, laken deploy packages your project, publishes it to Fabric, and makes it
available to your Fabric notebooks.
Your code stays modular. Your notebooks stay thin. And your local workflow survives contact with the platform.
Why “laken”?
Laken, pronounced LAH-kuhn, is Dutch for “cloth.” If you're feeling generous, it's a pun on Fabric and data lakes.
Installation
Install uv if needed, then add
laken:
uv add laken
pip install laken
Deploy uses uv to build your wheel before publishing to a Fabric environment.
Develop against your Fabric lakehouse
Set your credentials, select your workspace and lakehouse in a .env file at your
project root (or export them in your shell). Importing laken loads that file
automatically; variables already set in the environment are not overwritten.
AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_NAME=MyWorkspace
FABRIC_LAKEHOUSE_NAME=MyLakehouse
FABRIC_WORKSPACE_ID=...
FABRIC_LAKEHOUSE_ID=...
from laken import Lakehouse
lh = Lakehouse()
products = lh.read_table("marketing.products", as_="pandas")
lh.write_table(products, "staging.products_snapshot")
Lakehouse detects when it is running locally and when it is running inside Fabric.
Locally, the first read_table for a Fabric table pulls from OneLake and caches it under
.laken/ as Delta; later reads use the cache. In a Fabric notebook, the same code reads
from your attached lakehouse.
Local writes stay under .laken/ and do not sync to Fabric; in Fabric, writes persist to
tables on the attached lakehouse.
Deploy to Fabric
Structure your local code as a Python project using the standard src layout:
myapp/
├── pyproject.toml # [project] name = "myapp"
├── src/
│ └── myapp/
│ ├── __init__.py
│ └── pipeline.py
└── .env
Add laken to your project dependencies.
See the Python packaging guide if you are setting this up for the first time.
# src/myapp/pipeline.py
import pandas as pd
from laken import Lakehouse
def run_pipeline(lh: Lakehouse) -> None:
products = lh.read_table("marketing.products", as_="pandas")
summary = products.groupby("category", as_index=False)["amount"].sum()
lh.write_table(summary, "staging.product_summary")
When you are ready, laken deploy builds your package and loads it into your specified
Fabric Environment.
Deploy uses the same .env (or shell variables):
AZURE_TENANT_ID=...
AZURE_CLIENT_ID=...
AZURE_CLIENT_SECRET=...
FABRIC_WORKSPACE_ID=...
FABRIC_ENVIRONMENT_ID=...
From the repo root:
laken deploy
In a Fabric notebook:
from laken import Lakehouse
from myapp.pipeline import run_pipeline
lh = Lakehouse()
run_pipeline(lh)
Reference
Lakehouse
from laken import Lakehouse
lh = Lakehouse()
For tests or scripts that must pin a backend:
from laken import FabricLakehouse, LocalLakehouse
Tables — use schema.table to target a schema; a bare name is passed through to Spark
and Fabric resolves it (typically the default dbo schema on a schema-enabled lakehouse).
mode is "overwrite" or "append".
lh.write_table(df, "products")
lh.write_table(df, "marketing.products", mode="append")
df = lh.read_table("products") # pandas locally, Spark in Fabric
df = lh.read_table("products", as_="spark") # Spark (Fabric runtime)
df = lh.read_table("marketing.products", as_="polars")
lh.list_tables()
lh.table_exists("marketing.products")
lh.drop_table("marketing.products")
Files — local paths under .laken/workspace/Files; in Fabric, under the lakehouse
Files/ area.
lh.write_file(df, "exports/summary.parquet")
lh.read_file("exports/summary.parquet", as_="pandas")
lh.list_files("exports")
lh.file_exists("exports/summary.parquet")
lh.delete_file("exports/summary.parquet")
Warehouse tables — Spark synapsesql in Fabric; local parquet stand-in for tests.
lh.load_table_from_warehouse("SalesOrderHeader", "SalesWarehouse", as_="pandas")
Other lakehouses — defaults come from notebook context in Fabric; override locally or in notebooks:
lh = Lakehouse(lakehouse="Sales_LH")
lh.read_table("marketing.products", as_="pandas")
CLI
laken deploy [--workspace-id <id>] [--environment-id <id>]
laken status
laken refresh <table>
laken reset <table>
laken deploy builds the wheel from your repo's pyproject.toml, uploads it to a Fabric
Environment, and publishes it so notebooks can import your package.
laken status, laken refresh, and laken reset manage the local .laken/ cache on your
laptop. They do not run inside Fabric notebooks.
laken status lists cached tables with state (mirror, sample, or local), the Fabric
source version when known, and notes such as staleness or sample size.
laken refresh <table> re-downloads a table from Fabric when it was originally cached
from Fabric. Local-only tables are left unchanged.
laken reset <table> discards local changes and re-fetches from Fabric. The table must
have been cached from Fabric first.
Environment variables
Root .env is loaded when you import laken or run the laken CLI. Shell and CI
variables take precedence. Set PYTHON_DOTENV_DISABLED=1 to skip loading.
| Variable | Purpose |
|---|---|
AZURE_TENANT_ID |
Auth (fetch + deploy) |
AZURE_CLIENT_ID |
Auth (fetch + deploy) |
AZURE_CLIENT_SECRET |
Auth (fetch + deploy) |
FABRIC_WORKSPACE_NAME |
Local table fetch |
FABRIC_LAKEHOUSE_NAME |
Local table fetch |
FABRIC_WORKSPACE_ID |
OneLake paths; required for deploy |
FABRIC_LAKEHOUSE_ID |
OneLake paths |
FABRIC_ENVIRONMENT_ID |
Deploy target |
AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET are credentials from an
Azure service principal.
FABRIC_WORKSPACE_NAME, FABRIC_LAKEHOUSE_NAME, FABRIC_WORKSPACE_ID,
FABRIC_LAKEHOUSE_ID, and FABRIC_ENVIRONMENT_ID can be read from a Fabric notebook with
notebookutils:
import notebookutils
ctx = notebookutils.runtime.context
print(ctx.get("currentWorkspaceName"))
print(ctx.get("currentWorkspaceId"))
print(ctx.get("defaultLakehouseName"))
print(ctx.get("defaultLakehouseId"))
print(ctx.get("environmentId"))
Deploy expects pyproject.toml at the repo root, a buildable application wheel, and a
Fabric environment with a compatible Python/Spark runtime.
Local vs Fabric
| Class | Where | Storage | Reads | Writes |
|---|---|---|---|---|
Lakehouse |
Auto-detects notebook context | Fabric if available, else .laken/ Delta |
Local: Fabric → cache; Fabric: attached lakehouse | Local: .laken/ only; Fabric: attached lakehouse |
LocalLakehouse |
Laptop / CI | .laken/workspace/ |
Cached Delta and local tables | Local only; not pushed to Fabric |
FabricLakehouse |
Fabric notebook | Attached lakehouse | Spark/Delta on attached lakehouse | Delta tables on attached lakehouse |
First local read of a Fabric table fetches and caches Delta under .laken/. If Fabric
changes, laken warns and keeps the cache until you run laken refresh <table>. Large
tables may cache as a fixed-size sample.
Development
Contributions are welcome. To work on this package:
uv sync
uv run pytest
uv run ruff check
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file laken-0.1.3.tar.gz.
File metadata
- Download URL: laken-0.1.3.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed7d7adc84921850203724dec1bb69571b8b10c183352fc9825a6cf6535b689d
|
|
| MD5 |
ba243c45c76005cb89017eeb171e99ed
|
|
| BLAKE2b-256 |
6b947aaae87832b06ffc609daa1c479d5442ac137ee10763d1ae7c312afa454e
|
File details
Details for the file laken-0.1.3-py3-none-any.whl.
File metadata
- Download URL: laken-0.1.3-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6ba04e302daf2c2458d993b7e211166f929683f16b4b5b78f95a8576be67050
|
|
| MD5 |
14a6d37c0edc541539c0d9058617c380
|
|
| BLAKE2b-256 |
feeb28de87817424fdf22c7e5cd68b9e60f771fecf464706cc31b3b4e0270122
|