A library with the Lakehouse Framework
Project description
Lakehouse-NS gives you a simple framework to implement your lakehouse based on the Medallion Architecture.
- Currently, the frameworks supports the Bronze and Silver layer
- It currently, supports Spark. (Tested with Spark 4.0) More engines like Daft or Polars are in the backlog
- Currently, it supports Delta Lake as lakehouse format
- The framework will also be extended step by step with more baseline logic
Some import links:
- "Homepage" = "https://github.com/datanikkthegreek/lakehouse-docu"
- "API Referance" = "https://datanikkthegreek.github.io/lakehouse-docu/"
- "Samples" = "https://github.com/datanikkthegreek/lakehouse-docu/tree/main/samples"
- "Source" = "https://github.com/datanikkthegreek/lakehouse"
- "Issues" = "https://github.com/datanikkthegreek/lakehouse-docu/issues"
- "Project Planning" = "https://github.com/users/datanikkthegreek/projects/1/views/1"
- "Get in touch" = "https://www.linkedin.com/in/dr-nikolaos-servos-nikk-the-greek-a29137b3/"
1. Set-Up
Requires to have installed one of the following:
- pyspark and delta-spark
- Databricks Connect
- Spark and Delta Connect
- default spark session on Databricks or Fabric
pip install lakehouse-ns
Also you need to have a catalog set-up and your bronze and silver schema(s)
That's already it!
2. Get Started
Just import the Bronze and Silver classes and overwrite the load or transform functions. That's it.
from lakehouse import bronze, silver
spark = <Your Spark Session>
#Create your schemas
spark.sql(f"CREATE SCHEMA IF NOT EXISTS <catalog>.<schema>")
options = {
"catalog": "<catalog>",
"target_schema": "<schema>"
}
class StarWarsBronze(bronze.BronzeOverwrite):
def load(self, table):
return spark.read.format("SWAPI").load(table)
bronze_instance = StarWarsBronze(spark, **options)
bronze_instance.execute_one("people")
See detailed samples here: https://github.com/datanikkthegreek/lakehouse-docu/tree/main/samples
3. Options
You can/must pass in the Bronze and Silver class the following options. Besides you can specifiy any custom options which you can access via self.options in your class.
| Option | Description | Type | Default | Bronze | Silver |
|---|---|---|---|---|---|
| catalog | The name of the catalog, e.g. spark_catalog, hive_metastore or any other custom catalog | String | To be defined | Required | Required |
| source_schema | The schema from which the data is loaded | String | To be defined | Not Required | Required |
| target_schema | The schema to which the data are written | String | To be defined | Required | Required |
| merge_schema | If the schema should be automatically envolved/merged | Boolean | FALSE | Optional | Optional |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lakehouse_ns-0.1.1-py311-none-any.whl.
File metadata
- Download URL: lakehouse_ns-0.1.1-py311-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3.11
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
185120e0ac473ad614b377eb4a1f4490c0c856d3ce0bea03b0285e56dff5a633
|
|
| MD5 |
5d1ac7f6861868b9543dccc378b1eacf
|
|
| BLAKE2b-256 |
a642533f727f6091d9ab0a7691c21b452dd5eb4f7b7dff5fea595d3b1f662b7c
|