A package for implementing hybrid SCD1 and SCD2 operations using Delta Tables in Databricks
Project description
Hybrid SCD1 and SCD2 Implementation
This package provides a hybrid implementation of Slowly Changing Dimensions (SCD) Type 1 and Type 2 using Delta Table in Databricks. It allows you to apply SCD2 based on specified columns and SCD1 for other columns.
Features
- Hybrid SCD1 and SCD2: The code performs a hybrid implementation of SCD1 and SCD2.
- Column-based SCD2: SCD2 will be applied if any value changes in the specified SCD2 columns.
- Column-based SCD1: SCD1 will be applied if any value changes in columns other than the specified SCD2 columns.
Usage
apply_scd Function
The apply_scd function handles the implementation of SCD based on the specified columns. This function is designed for Delta tables in Databricks and requires the target table to have the following columns: record_status, effective_from, effective_to, dw_inserted_at, dw_updated_at, scd_key, and upd_key.
SCD Handler Example
This example demonstrates how to use the scd_handler from the delta_hybrid_scd module to apply Slowly Changing Dimension (SCD) Type 2 logic using PySpark.
1. Prepare Data
from datetime import datetime
from delta_hybrid_scd import scd_handler
incremental_data = [
(1, "Google", 0, "Kite", datetime(2015, 12, 25, 10, 5, 30)),
(1, "BTC", 0, "Binance", datetime(2016, 12, 25, 11, 5, 30)),
(3, "ETH", 20, "Binance", datetime(2016, 12, 26, 12, 7, 35))
]
schema = ["id", "stock_name", "balance", "platform", "last_modify_ts"]
df = spark.createDataFrame(incremental_data, schema)
2. Apply SCD
target_table = f"{catalog_name}.{silver_schema}.account_scd2"
pk_col = ["id", "stock_name"] # Primary key columns
skey_col = ["balance"] # Columns to track SCD2 changes on
effective_from_col = "last_modify_ts" # Timestamp column to log changes
select_col_list = ["id", "stock_name", "balance", "platform"]
scd_handler.apply_scd(
df,
skey_col,
pk_col,
target_table,
select_col_list,
effective_from_col
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file delta_hybrid_scd-0.1.1.tar.gz.
File metadata
- Download URL: delta_hybrid_scd-0.1.1.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.11 Linux/5.15.0-1075-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1a07cbf189d6379d843c5d1f0aa45862ce17f199cd6c6d5275136903332e6a1
|
|
| MD5 |
cbf8eef7a3951662a761833aca44807e
|
|
| BLAKE2b-256 |
7a41b239f02b122e8bb33ea6dbbe5eef5e2cdcf8c427170ab16b0fdf6939d4b8
|
File details
Details for the file delta_hybrid_scd-0.1.1-py3-none-any.whl.
File metadata
- Download URL: delta_hybrid_scd-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.11 Linux/5.15.0-1075-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2fc283c2699150707e4bf97e8eacd31b9228b8fccc2a6aeeac47a2966f6eecc
|
|
| MD5 |
b252ca1e8b6f621354d8c6a38a89ed29
|
|
| BLAKE2b-256 |
7687cbdff6a038baac8c3faf4ab6011c9077d66e63c0bcea543686ab786d4f59
|