Skip to main content

A DataOps framework for building a lakehouse

Project description

Laktory

pypi test downloads versions license

A DataOps framework for building Databricks lakehouse.

what is laktory

Laktory makes it possible to express and bring to life your data vision, from raw to enriched analytics-ready datasets and finely tuned AI models, while adhering to basic DevOps best practices such as source control, code reviews and CI/CD.

Using a declarative approach, you define your datasets and transformations, validate them and deploy them into Databricks workspaces. Once deployed, you can once again leverage Laktory for debugging and monitoring.

Help

See documentation for more details.

Installation

Install using

pip install laktory[{cloud_provider}]

where {cloud_provider} is azure, aws or gcp.

For more installation options, see the Install section in the documentation.

A Basic Example

from laktory import models

table = models.Table(
    name="brz_stock_prices",
    catalog_name="prod",
    schema_name="finance",
    timestamp_key="data.created_at",
    builder={
        "layer": "SILVER",
        "table_source": {
            "name": "brz_stock_prices",
        }
    },
    columns=[
        {
            "name": "symbol",
            "type": "string",
            "sql_expression": "data.symbol"
        }
    ]
)

print(table)
#> catalog_name='prod' columns=[Column(catalog_name='prod', comment=None, name='symbol', pii=None, schema_name='finance', spark_func_args=[], spark_func_kwargs={}, spark_func_name=None, sql_expression='data.symbol', table_name='brz_stock_prices', type='string', unit=None)] comment=None data=None grants=None name='brz_stock_prices' primary_key=None schema_name='finance' timestamp_key='data.created_at' builder=TableBuilder(drop_source_columns=True, drop_duplicates=None, event_source=None, joins=[], pipeline_name=None, table_source=TableDataSource(read_as_stream=True, catalog_name='prod', cdc=None, selects=None, filter=None, from_pipeline=True, name='brz_stock_prices', schema_name='finance', watermark=None), layer='SILVER')

To get started with a more useful example, jump into the Quickstart.

A full Data Ops template

A comprehensive template on how to deploy a lakehouse as code using Laktory is maintained here: https://github.com/okube-ai/lakehouse-as-code.

In this template, 4 pulumi projects are used to:

  • {cloud_provider}_infra: Deploy the required resources on your cloud provider
  • unity-catalog: Setup users, groups, catalogs, schemas and manage grants
  • workspace-conf: Setup secrets, clusters and warehouses
  • workspace: The data workflows to build your lakehouse.

Okube Company

Okube is dedicated to build open source frameworks, the kubes, that empower businesses to build and deploy highly scalable data platforms and AI models. Contributions are more than welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laktory-0.0.25.tar.gz (3.1 MB view hashes)

Uploaded Source

Built Distribution

laktory-0.0.25-py3-none-any.whl (95.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page