A DataOps framework for building a lakehouse
Project description
Laktory
A DataOps framework for building Databricks lakehouse.
Okube Company
Okube is committed to develop open source data and ML engineering tools. This is an open space. Contributions are more than welcome.
Help
TODO: Build full help documentation
Installation
Install using pip install laktory
TODO: Full installation instructions
A Basic Example
This example demonstrates how to send data events to a data lake and to set a data pipeline defining the tables transformation layers.
Generate data events
A data event class defines specifications of an event and provides the methods for writing that event to a storage.
from laktory import models
from datetime import datetime
class StockPriceData(models.DataEvent):
name: str = "stock_price"
producer: models.Producer = models.Producer(name="yahoo-finance")
events = [
StockPriceData(
data={
"created_at": datetime(2023, 8, 23),
"symbol": "GOOGL",
"open": 130.25,
"close": 132.33,
},
),
StockPriceData(
data={
"created_at": datetime(2023, 8, 24),
"symbol": "GOOGL",
"open": 132.00,
"close": 134.12,
},
)
]
These events may now be sent to your cloud storage of choice.
Define data pipeline and data tables
A pipeline class defines the transformations of a raw data event into curated (silver) and consumption (gold) layers.
from laktory import models
class StockPricesPipeline(models.Pipeline):
name: str = "pl-stock-prices"
tables: list[models.Table] = [
models.Table(
name="brz_stock_prices",
timestamp_key="data.created_at",
event_source=models.EventDataSource(
name="stock_price",
producer=models.Producer(
name="yahoo-finance",
)
),
zone="BRONZE",
),
models.Table(
name="brz_stock_prices",
table_source=models.TableSource(
name="brz_stock_prices",
),
zone="SILVER",
columns = [
{
"name": "created_at",
"type": "timestamp",
"func_name": "coalesce",
"input_cols": ["_created_at"],
},
{
"name": "low",
"type": "double",
"func_name": "coalesce",
"input_cols": ["data.low"],
},
{
"name": "high",
"type": "double",
"func_name": "coalesce",
"input_cols": ["data.high"],
},
]
),
]
Laktory will provide the required framework for deploying this pipeline as a delta live tables in Databricks and all the associated notebooks and jobs. TODO: link to help
Contributing
TODO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.