Skip to main content

A library with the Lakehouse Framework

Project description

Lakehouse-NS gives you a simple framework to implement your lakehouse based on the Medallion Architecture.

  • Currently, the frameworks supports the Bronze and Silver layer
  • It currently, supports Spark. (Tested with Spark 4.0) More engines like Daft or Polars are in the backlog
  • Currently, it supports Delta Lake as lakehouse format
  • The framework will also be extended step by step with more baseline logic

Some import links:

1. Set-Up

Requires to have installed one of the following:

  • pyspark and delta-spark
  • Databricks Connect
  • Spark and Delta Connect
  • default spark session on Databricks or Fabric

pip install lakehouse-ns

Also you need to have a catalog set-up and your bronze and silver schema(s)

That's already it!

2. Get Started

Just import the Bronze and Silver classes and overwrite the load or transform functions. That's it.

from lakehouse import bronze, silver

spark = <Your Spark Session>

#Create your schemas
spark.sql(f"CREATE SCHEMA IF NOT EXISTS <catalog>.<schema>")

options = {
    "catalog": "<catalog>",
    "target_schema": "<schema>" 
}


class StarWarsBronze(bronze.BronzeOverwrite):
    def load(self, table):
        return spark.read.format("SWAPI").load(table)
    
bronze_instance = StarWarsBronze(spark, **options)
bronze_instance.execute_one("people")

See detailed samples here: https://github.com/datanikkthegreek/lakehouse-docu/tree/main/samples

3. Options

You can/must pass in the Bronze and Silver class the following options. Besides you can specifiy any custom options which you can access via self.options in your class.

Option Description Type Default Bronze Silver
catalog The name of the catalog, e.g. spark_catalog, hive_metastore or any other custom catalog String To be defined Required Required
source_schema The schema from which the data is loaded String To be defined Not Required Required
target_schema The schema to which the data are written String To be defined Required Required
merge_schema If the schema should be automatically envolved/merged Boolean FALSE Optional Optional

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lakehouse_ns-0.1.1-py311-none-any.whl (24.7 kB view details)

Uploaded Python 3.11

File details

Details for the file lakehouse_ns-0.1.1-py311-none-any.whl.

File metadata

  • Download URL: lakehouse_ns-0.1.1-py311-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for lakehouse_ns-0.1.1-py311-none-any.whl
Algorithm Hash digest
SHA256 185120e0ac473ad614b377eb4a1f4490c0c856d3ce0bea03b0285e56dff5a633
MD5 5d1ac7f6861868b9543dccc378b1eacf
BLAKE2b-256 a642533f727f6091d9ab0a7691c21b452dd5eb4f7b7dff5fea595d3b1f662b7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page