A DataOps framework for building a lakehouse

These details have not been verified by PyPI

Project links

Project description

Laktory

A DataOps framework for building Databricks lakehouse.

Okube Company

Okube is committed to develop open source data and ML engineering tools. This is an open space. Contributions are more than welcome.

Help

TODO: Build full help documentation

Installation

Install using pip install laktory

TODO: Full installation instructions

pyspark

Optionally, you can also install spark locally to test your custom functions.

TODO: Add pyspark instructions https://www.machinelearningplus.com/pyspark/install-pyspark-on-mac/

JAVA_HOME=/opt/homebrew/opt/java;
SPARK_HOME=/opt/homebrew/Cellar/apache-spark/3.5.0/libexec

A Basic Example

This example demonstrates how to send data events to a data lake and to set a data pipeline defining the tables transformation layers.

Generate data events

A data event class defines specifications of an event and provides methods for writing that event directly to a cloud storage or through a databricks volume or mount.

from laktory import models
from datetime import datetime


events = [
    models.DataEvent(
        name="stock_price",
        producer={
            "name": "yahoo-finance",
        },
        data={
            "created_at": datetime(2023, 8, 23),
            "symbol": "GOOGL",
            "open": 130.25,
            "close": 132.33,
        },
    ),
    models.DataEvent(
        name="stock_price",
        producer={
            "name": "yahoo-finance",
        },
        data={
            "created_at": datetime(2023, 8, 24),
            "symbol": "GOOGL",
            "open": 132.00,
            "close": 134.12,
        },
    )
]

for event in events:
    event.to_databricks()

Define data pipeline and data tables

A yaml file define the configuration for a data pipeline, including the transformations of a raw data event into curated (silver) and consumption (gold) layers.

name: pl-stock-prices

catalog: ${var.env}
target: default

clusters:
  - name : default
    node_type_id: Standard_DS3_v2
    autoscale:
      min_workers: 1
      max_workers: 2

libraries:
  - notebook:
      path: /pipelines/dlt_template_brz.py
  - notebook:
      path: /pipelines/dlt_template_slv.py

permissions:
  - group_name: account users
    permission_level: CAN_VIEW
  - group_name: role-engineers
    permission_level: CAN_RUN

# --------------------------------------------------------------------------- #
# Tables                                                                      #
# --------------------------------------------------------------------------- #

tables:
  - name: brz_stock_prices
    timestamp_key: data.created_at
    event_source:
      name: stock_price
      producer:
        name: yahoo-finance
    zone: BRONZE


  - name: slv_stock_prices
    table_source:
      catalog_name: ${var.env}
      schema_name: finance
      name: brz_stock_prices
    zone: SILVER
    columns:
      - name: created_at
        type: timestamp
        spark_func_name: coalesce
        spark_func_args:
          - data._created_at

      - name: open
        type: double
        spark_func_name: coalesce
        spark_func_args:
          - data.open

      - name: close
        type: double
        spark_func_name: coalesce
        spark_func_args:
          - data.close

      - name: high
        type: double
        sql_expression: GREATEST(data.open, data.close)

Deploy your configuration

Laktory currently support Pulumi for cloud deployment, but more engines will be added in the future (Terraform, Databricks CLI, etc.).

import os
import pulumi
from laktory import models

# Read configuration file
with open("pipeline.yaml", "r") as fp:
    pipeline = models.Pipeline.model_validate_yaml(fp)

# Set variables
pipeline.vars = {
    "env": os.getenv("ENV"),
}
    
# Deploy
pipeline.deploy_with_pulumi()

A full Data Ops template

A comprehensive template on how to deploy a lakehouse as code using Laktory is maintained here: https://github.com/okube-ai/lakehouse-as-code.

In this template, 4 pulumi projects are used to:

{cloud_provider}_infra: Deploy the required resources on your cloud provider
unity-catalog: Setup users, groups, catalogs, schemas and manage grants
workspace-conf: Setup secrets, clusters and warehouses
workspace: The data workflows to build your lakehouse.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.14

Oct 8, 2024

0.4.13

Oct 1, 2024

0.4.12

Sep 18, 2024

0.4.11

Aug 16, 2024

0.4.10

Jul 20, 2024

0.4.9

Jul 20, 2024

0.4.8

Jul 3, 2024

0.4.7

Jun 27, 2024

0.4.6

Jun 27, 2024

0.4.5

Jun 25, 2024

0.4.4

Jun 25, 2024

0.4.3

Jun 12, 2024

0.4.2

Jun 11, 2024

0.4.1

Jun 11, 2024

0.4.0

Jun 11, 2024

0.3.3

May 30, 2024

0.3.2

May 28, 2024

0.3.1

May 28, 2024

0.3.0

May 28, 2024

0.2.1

May 7, 2024

0.2.0

May 2, 2024

0.1.10

Apr 23, 2024

0.1.9

Apr 17, 2024

0.1.8

Mar 25, 2024

0.1.7

Mar 15, 2024

0.1.6

Feb 23, 2024

0.1.5

Feb 14, 2024

0.1.4

Feb 12, 2024

0.1.3

Feb 10, 2024

0.1.2

Feb 5, 2024

0.1.1

Jan 28, 2024

0.1.0

Jan 12, 2024

0.0.29

Dec 20, 2023

0.0.28

Dec 17, 2023

0.0.27

Dec 16, 2023

0.0.26

Dec 16, 2023

0.0.25

Dec 12, 2023

0.0.24

Dec 5, 2023

0.0.23

Dec 1, 2023

0.0.22

Nov 29, 2023

0.0.21

Nov 27, 2023

0.0.20

Nov 27, 2023

0.0.19

Nov 23, 2023

0.0.18

Nov 14, 2023

0.0.17

Nov 13, 2023

0.0.16

Nov 8, 2023

This version

0.0.15

Nov 7, 2023

0.0.14

Nov 6, 2023

0.0.13

Nov 6, 2023

0.0.12

Nov 5, 2023

0.0.11

Nov 5, 2023

0.0.10

Oct 31, 2023

0.0.9

Oct 27, 2023

0.0.8

Oct 24, 2023

0.0.7

Oct 20, 2023

0.0.6

Oct 10, 2023

0.0.5

Sep 28, 2023

0.0.4

Sep 27, 2023

0.0.3

Sep 25, 2023

0.0.2

Sep 24, 2023

0.0.1

Jul 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laktory-0.0.15.tar.gz (46.2 kB view hashes)

Uploaded Nov 7, 2023 Source

Built Distribution

laktory-0.0.15-py3-none-any.whl (59.0 kB view hashes)

Uploaded Nov 7, 2023 Python 3

Hashes for laktory-0.0.15.tar.gz

Hashes for laktory-0.0.15.tar.gz
Algorithm	Hash digest
SHA256	`56d02ba43b14caa61d9879503b7bd50a7f5b7f3510c4f77f25744d7becb4befb`
MD5	`ab9da37d15f9192583ae106afa02883b`
BLAKE2b-256	`476162d582bd53c4a5daa22294aed3ab039063af59a1f2bf30c069e05083dcbd`

Hashes for laktory-0.0.15-py3-none-any.whl

Hashes for laktory-0.0.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2db673ebdf71153dfb026785640f911485c338a6caec00f5c28838f945fe858c`
MD5	`b6d0522b0520c475be475795d6828a30`
BLAKE2b-256	`10ce153d120bbccab6410106725d2619fc41d7d3b41d2b968b63b7c1326f94d7`