A DataOps framework for building a lakehouse
Project description
Laktory
A DataOps framework for building Databricks lakehouse.
Laktory makes it possible to express and bring to life your data vision, from raw to enriched analytics-ready datasets and finely tuned AI models, while adhering to basic DevOps best practices such as source control, code reviews and CI/CD.
Using a declarative approach, you define your datasets and transformations, validate them and deploy them into Databricks workspaces. Once deployed, you can once again leverage Laktory for debugging and monitoring.
Help
See documentation for more details.
Installation
Install using
pip install laktory[{cloud_provider}]
where {cloud_provider}
is azure
, aws
or gcp
.
For more installation options, see the Install section in the documentation.
A Basic Example
from laktory import models
table = models.Table(
name="brz_stock_prices",
catalog_name="prod",
schema_name="finance",
timestamp_key="data.created_at",
builder={
"layer": "SILVER",
"table_source": {
"name": "brz_stock_prices",
},
"spark_chain": {
"nodes": [
{
"column": {"name": "symbol"},
"type": "string",
"sql_expression": "data.symbol"
}
]
}
},
)
print(table)
#> catalog_name='prod' columns=[Column(catalog_name='prod', comment=None, name='symbol', pii=None, schema_name='finance', spark_func_args=[], spark_func_kwargs={}, spark_func_name=None, sql_expression='data.symbol', table_name='brz_stock_prices', type='string', unit=None)] comment=None data=None grants=None name='brz_stock_prices' primary_key=None schema_name='finance' timestamp_key='data.created_at' builder=TableBuilder(drop_source_columns=True, drop_duplicates=None, event_source=None, joins=[], pipeline_name=None, table_source=TableDataSource(read_as_stream=True, catalog_name='prod', cdc=None, selects=None, filter=None, from_pipeline=True, name='brz_stock_prices', schema_name='finance', watermark=None), layer='SILVER')
To get started with a more useful example, jump into the Quickstart.
A full Data Ops template
A comprehensive template on how to deploy a lakehouse as code using Laktory is maintained here: https://github.com/okube-ai/lakehouse-as-code.
In this template, 4 pulumi projects are used to:
{cloud_provider}_infra
: Deploy the required resources on your cloud providerunity-catalog
: Setup users, groups, catalogs, schemas and manage grantsworkspace-conf
: Setup secrets, clusters and warehousesworkspace
: The data workflows to build your lakehouse.
Okube Company
Okube is dedicated to building open source frameworks, known as the kubes, empowering businesses to build, deploy and operate highly scalable data platforms and AI models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.