A DataOps framework for building a lakehouse

These details have not been verified by PyPI

Project links

Project description

Laktory

A DataOps framework for building Databricks lakehouse.

Okube Company

Okube is committed to develop open source data and ML engineering tools. This is an open space. Contributions are more than welcome.

Help

TODO: Build full help documentation

Installation

Install using pip install laktory

TODO: Full installation instructions

A Basic Example

This example demonstrates how to send data events to a data lake and to set a data pipeline defining the tables transformation layers.

Generate data events

A data event class defines specifications of an event and provides the methods for writing that event to a databricks mount or a cloud storage.

from laktory import models
from datetime import datetime


events = [
    models.DataEvent(
        name="stock_price",
        producer={
            "name": "yahoo-finance",
        },
        data={
            "created_at": datetime(2023, 8, 23),
            "symbol": "GOOGL",
            "open": 130.25,
            "close": 132.33,
        },
    ),
    models.DataEvent(
        name="stock_price",
        producer={
            "name": "yahoo-finance",
        },
        data={
            "created_at": datetime(2023, 8, 24),
            "symbol": "GOOGL",
            "open": 132.00,
            "close": 134.12,
        },
    )
]

for event in events:
    event.to_databricks_mount()

These events may now be sent to your cloud storage of choice.

Define data pipeline and data tables

A pipeline class defines the transformations of a raw data event into curated (silver) and consumption (gold) layers.

from laktory import models

pl = models.Pipeline(
    name="pl-stock-prices",
    tables=[
        models.Table(
            name="brz_stock_prices",
            timestamp_key="data.created_at",
            event_source=models.EventDataSource(
                name="stock_price",
                producer=models.Producer(
                    name="yahoo-finance",
                )
            ),
            zone="BRONZE",
        ),
        models.Table(
            name="brz_stock_prices",
            table_source=models.TableSource(
                name="brz_stock_prices",
            ),
            zone="SILVER",
            columns = [
                {
                    "name": "created_at",
                    "type": "timestamp",
                    "func_name": "coalesce",
                    "input_cols": ["_created_at"],
                },
                {
                    "name": "low",
                    "type": "double",
                    "func_name": "coalesce",
                    "input_cols": ["data.low"],
                },
                {
                    "name": "high",
                    "type": "double",
                    "func_name": "coalesce",
                    "input_cols": ["data.high"],
                },
            ]
        ),
    ]
)

Laktory will provide the required framework for deploying this pipeline as a delta live tables in Databricks and all the associated notebooks and jobs. TODO: link to help

Contributing

TODO

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.11

Aug 16, 2024

0.4.10

Jul 20, 2024

0.4.9

Jul 20, 2024

0.4.8

Jul 3, 2024

0.4.7

Jun 27, 2024

0.4.6

Jun 27, 2024

0.4.5

Jun 25, 2024

0.4.4

Jun 25, 2024

0.4.3

Jun 12, 2024

0.4.2

Jun 11, 2024

0.4.1

Jun 11, 2024

0.4.0

Jun 11, 2024

0.3.3

May 30, 2024

0.3.2

May 28, 2024

0.3.1

May 28, 2024

0.3.0

May 28, 2024

0.2.1

May 7, 2024

0.2.0

May 2, 2024

0.1.10

Apr 23, 2024

0.1.9

Apr 17, 2024

0.1.8

Mar 25, 2024

0.1.7

Mar 15, 2024

0.1.6

Feb 23, 2024

0.1.5

Feb 14, 2024

0.1.4

Feb 12, 2024

0.1.3

Feb 10, 2024

0.1.2

Feb 5, 2024

0.1.1

Jan 28, 2024

0.1.0

Jan 12, 2024

0.0.29

Dec 20, 2023

0.0.28

Dec 17, 2023

0.0.27

Dec 16, 2023

0.0.26

Dec 16, 2023

0.0.25

Dec 12, 2023

0.0.24

Dec 5, 2023

0.0.23

Dec 1, 2023

0.0.22

Nov 29, 2023

0.0.21

Nov 27, 2023

0.0.20

Nov 27, 2023

0.0.19

Nov 23, 2023

0.0.18

Nov 14, 2023

0.0.17

Nov 13, 2023

0.0.16

Nov 8, 2023

0.0.15

Nov 7, 2023

0.0.14

Nov 6, 2023

0.0.13

Nov 6, 2023

0.0.12

Nov 5, 2023

0.0.11

Nov 5, 2023

0.0.10

Oct 31, 2023

0.0.9

Oct 27, 2023

0.0.8

Oct 24, 2023

0.0.7

Oct 20, 2023

0.0.6

Oct 10, 2023

This version

0.0.5

Sep 28, 2023

0.0.4

Sep 27, 2023

0.0.3

Sep 25, 2023

0.0.2

Sep 24, 2023

0.0.1

Jul 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laktory-0.0.5.tar.gz (26.6 kB view hashes)

Uploaded Sep 28, 2023 Source

Built Distribution

laktory-0.0.5-py3-none-any.whl (25.7 kB view hashes)

Uploaded Sep 28, 2023 Python 3

Hashes for laktory-0.0.5.tar.gz

Hashes for laktory-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`2e703bb72a4e34cc189ef29eef08eb3df536f5275b1af8a02eca222ec026553c`
MD5	`02b8b7275df247f38c1d383301111054`
BLAKE2b-256	`da5bbd6acee687f4329763443c54f6073d508195acc6d609085d6312e2b56e00`

Hashes for laktory-0.0.5-py3-none-any.whl

Hashes for laktory-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85c708bf42ccb9a4e6add6949c17323f09ad5185980bba8136dd75b72194b787`
MD5	`a2d23ba78149fe7860730862282eb4f8`
BLAKE2b-256	`301aa2c89fcbf661d4694644d061c01bc561de25789f8e516fae60fce63c7268`