An ETL and DataOps framework for building a lakehouse
Project description
Laktory
An open-source DataOps and dataframe-centric ETL framework for building lakehouses.
Laktory is your all-in-one solution for defining both data transformations and Databricks resources. Imagine if Terraform, Databricks Asset Bundles, and dbt combined forces and added support for DataFrame API—that’s essentially Laktory.
This open-source framework streamlines the creation, deployment, and execution of data pipelines while adhering to essential DevOps practices such as version control, code reviews, and CI/CD integration. Powered by Narwhals, Laktory enables seamless transitions between Apache Spark, Polars, and other frameworks to perform data transformations reliably and at scale. Its modular and flexible design allows you to effortlessly combine SQL statements with DataFrame operations, reducing complexity and enhancing productivity.
Since Laktory pipelines are built on top of Narwhals, they can run in any environment that supports Python—from your local machine to a Kubernetes cluster. Pipelines can be orchestrated using tools like Apache Airflow or deployed directly as Databricks Jobs or Declarative Pipelines, offering both flexible and fully managed execution options.
But Laktory goes beyond data pipelines. It empowers you to define and deploy your entire Databricks data platform—from Unity Catalog and access grants to compute and quality monitoring—providing a complete, modern solution for data platform management. This empowers your data team to take full ownership of the solution, eliminating the need to juggle multiple technologies. Say goodbye to relying on external Terraform experts to handle compute, workspace configuration, and Unity Catalog, while your data engineers and analysts try to combine Databricks Asset Bundles and dbt to build data pipelines. Laktory consolidates these functions, simplifying the entire process and reducing the overall cost.
Laktory pipelines can run locally for development, or be orchestrated at scale using tools like Apache Airflow or Databricks Workflows.
Help
See documentation for more details.
Installation
Install using
pip install laktory
For more installation options, see the Install section in the documentation.
A Basic Example
from laktory import models
node_brz = models.PipelineNode(
name="brz_stock_prices",
source={
"format": "PARQUET",
"path": "./data/brz_stock_prices/"
},
transformer={
"nodes": []
}
)
node_slv = models.PipelineNode(
name="slv_stock_prices",
source={
"node_name": "brz_stock_prices"
},
sinks=[{
"path": "./data/slv_stock_prices",
"mode": "OVERWRITE",
"format": "PARQUET",
}],
transformer={
"nodes": [
# SQL Transformation
{
"expr": """
SELECT
data.created_at AS created_at,
data.symbol AS symbol,
data.open AS open,
data.close AS close,
data.high AS high,
data.low AS low,
data.volume AS volume
FROM
{df}
"""
},
# Spark Transformation
{
"func_name": "drop_duplicates",
"func_kwargs": {
"subset": ["created_at", "symbol"]
}
},
]
}
)
pipeline = models.Pipeline(
name="stock_prices",
nodes=[node_brz, node_slv],
)
pipeline.execute(spark=spark)
To get started with a more useful example, jump into the Quickstart.
Get Involved
Laktory is growing rapidly, and we'd love for you to be part of our journey! Here's how you can get involved:
- Join the Community: Connect with fellow Laktory users and contributors on our Slack. Share ideas, ask questions, and collaborate!
- Suggest Features or Report Issues: Have an idea for a new feature or encountering an issue? Let us know on GitHub Issues. Your feedback helps shape the future of Laktory!
- Contribute to Laktory: Check out our contributing guide to learn how you can tackle issues and add value to the project.
A Lakehouse DataOps Template
A comprehensive template on how to deploy a lakehouse as code using Laktory is maintained here: https://github.com/okube-ai/lakehouse-as-code
In this template, 4 pulumi projects are used to:
{cloud_provider}_infra: Deploy the required resources on your cloud providerunity-catalog: Setup users, groups, catalogs, schemas and manage grantsworkspace: Setup secrets, clusters and warehouses and common files/notebooksworkflows: The data workflows to build your lakehouse
Okube Company
Okube is dedicated to building open source frameworks, known as the kubes, empowering businesses to build, deploy and operate highly scalable data platforms and AI models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file laktory-0.9.4.tar.gz.
File metadata
- Download URL: laktory-0.9.4.tar.gz
- Upload date:
- Size: 688.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e30a129379fd2bbdcc324ec66c061e6fabffa762963717bd42026e9f74e24f7
|
|
| MD5 |
3491a4e62f32d3fb6f9eb2952f7d013e
|
|
| BLAKE2b-256 |
4020caae34fa2e8a90c1d1b0b9a17df57d50f01509ccc1408242cc06489c413b
|
File details
Details for the file laktory-0.9.4-py3-none-any.whl.
File metadata
- Download URL: laktory-0.9.4-py3-none-any.whl
- Upload date:
- Size: 816.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6f633529c2d06f0fbc23167438a27153613b9248cc8e86c2769aecc77978562
|
|
| MD5 |
29060cdb8963fb684fa3384350c0fb37
|
|
| BLAKE2b-256 |
fd526193efdd7af71803de7462c7ac44e761bb7395a27e494d429430c8a59de3
|