Package for Fabric Engineers

Project description

FabricEngineer Package

Description

FabricEngineer is a comprehensive Python package designed specifically for Microsoft Fabric developers to streamline data transformation workflows and automate complex ETL processes. This package provides enterprise-grade solutions for building robust data pipelines with minimal boilerplate code.

Key Features

🚀 Silver Layer Data Ingestion Services

Insert-Only Pattern: Efficient data ingestion with support for schema evolution and historization
SCD Type 2 (Slowly Changing Dimensions): Complete implementation of Type 2 SCD with automatic history tracking
Delta Load Support: Optimized incremental data processing with broadcast join capabilities
Schema Evolution: Automatic handling of schema changes with backward compatibility

📊 Materialized Lake Views (MLV)

Automated MLV Generation: Create and manage materialized views with SQL generation
Schema-aware Operations: Intelligent handling of schema changes and column evolution
Lakehouse Integration: Seamless integration with Microsoft Fabric Lakehouse architecture

🔧 Advanced Data Engineering Features

Configurable Transformations: Flexible transformation pipelines with custom business logic
Data Quality Controls: Built-in validation and data quality checks
Performance Optimization: Broadcast joins, partition strategies, and optimized query patterns
Comprehensive Logging: Integrated logging and performance monitoring with TimeLogger

Installation

pip install fabricengineer-py

Quick Start Guide

Prerequisites

Microsoft Fabric workspace with Lakehouse
PySpark environment
Python 3.11+

Usage Examples

Silver Layer Data Ingestion

Insert-Only Pattern

The Insert-Only service is ideal for append-only scenarios where you need to track all changes while maintaining performance.

from pyspark.sql import DataFrame, functions as F
from fabricengineer.logging import TimeLogger
from fabricengineer.transform.lakehouse import LakehouseTable
from fabricengineer.transform import SilverIngesationInsertOnly


def transform_projects(df: DataFrame, etl) -> DataFrame:
    df = df.withColumn("dtime", F.to_timestamp("dtime"))
    return df


def transform_all(df: DataFrame, etl) -> DataFrame:
    df = df.withColumn("data", F.lit("values"))
    return df


# Initialize performance monitoring
timer = TimeLogger()

# Define table-specific transformations
transformations = {
    "*": transform_all,             # Applied to all tables
    "projects": transform_projects  # Applied only to projects table
}

# Configure source and destination tables
source_table = LakehouseTable(
    lakehouse="BronzeLakehouse",
    schema="schema",
    table="projects"
)
destination_table = LakehouseTable(
    lakehouse="SilverLakehouse",
    schema=source_table.schema,
    table=source_table.table
)

# Initialize and configure the ETL service
etl = SilverIngestionInsertOnly()
etl.init(
    spark_=spark,
    notebookutils_=notebookutils,
    source_table=source_table,
    destination_table=destination_table,
    nk_columns=NK_COLUMNS,
    constant_columns=CONSTANT_COLUMNS,
    is_delta_load=IS_DELTA_LOAD,
    delta_load_use_broadcast=DELTA_LOAD_USE_BROADCAST,
    transformations=transformations,
    exclude_comparing_columns=EXCLUDE_COLUMNS_FROM_COMPARING,
    include_comparing_columns=INCLUDE_COLUMNS_AT_COMPARING,
    historize=HISTORIZE,
    partition_by_columns=PARTITION_BY_COLUMNS,
    df_bronze=None,
    create_historized_mlv=True
)


timer.start().log()
etl.run()
timer.end().log()

SCD Type 2 (Slowly Changing Dimensions)

The SCD2 service implements Type 2 Slowly Changing Dimensions with automatic history tracking and current record management.

from pyspark.sql import DataFrame, functions as F
from fabricengineer.logging import TimeLogger
from fabricengineer.transform.lakehouse import LakehouseTable
from fabricengineer.transform import SilverIngestionSCD2Service


def transform_projects(df: DataFrame, etl) -> DataFrame:
    df = df.withColumn("dtime", F.to_timestamp("dtime"))
    return df


def transform_all(df: DataFrame, etl) -> DataFrame:
    df = df.withColumn("data", F.lit("values"))
    return df


# Initialize performance monitoring
timer = TimeLogger()

# Define table-specific transformations
transformations = {
    "*": transform_all,             # Applied to all tables
    "projects": transform_projects  # Applied only to projects table
}

# Configure source and destination tables
source_table = LakehouseTable(
    lakehouse="BronzeLakehouse",
    schema="schema",
    table="projects"
)
destination_table = LakehouseTable(
    lakehouse="SilverLakehouse",
    schema=source_table.schema,
    table=source_table.table
)

# Initialize and configure the ETL service
etl = SilverIngestionSCD2Service()
etl.init(
    spark_=spark,
    notebookutils_=notebookutils,
    source_table=source_table,
    destination_table=destination_table,
    nk_columns=NK_COLUMNS,
    constant_columns=CONSTANT_COLUMNS,
    is_delta_load=IS_DELTA_LOAD,
    delta_load_use_broadcast=DELTA_LOAD_USE_BROADCAST,
    transformations=transformations,
    exclude_comparing_columns=EXCLUDE_COLUMNS_FROM_COMPARING,
    include_comparing_columns=INCLUDE_COLUMNS_AT_COMPARING,
    historize=HISTORIZE,
    partition_by_columns=PARTITION_BY_COLUMNS,
    df_bronze=None
)


timer.start().log()
etl.run()
timer.end().log()

Materialized Lake Views Management

Prerequisites

Configure a Utils Lakehouse as your default Lakehouse. The generated view SQL code will be saved as .sql.txt files in the lakehouse under /Files/mlv/{lakehouse}/{schema}/{table}.sql.txt.

from fabricengineer.mlv import MaterializeLakeView

# Initialize the Materialized Lake View manager
mlv = MaterializedLakeView(
    lakehouse="SilverBusinessLakehouse",
    schema="schema",
    table="projects"
)
print(mlv.to_dict())

# Define your custom SQL query
sql = """
SELECT
    p.id
    ,p.projectname
    ,p.budget
    ,u.name AS projectlead
FROM dbo.projects p
LEFT JOIN users u
ON p.projectlead_id = u.id
"""

# Create or replace the materialized view
result = mlv.create_or_replace(sql)
display(result)

Remote Module Import for Fabric Notebooks

Import specific package modules directly into your Fabric notebooks from GitHub releases:

# Cell 1:
import requests

VERSION = "0.1.0"
url = f"https://raw.githubusercontent.com/enricogoerlitz/fabricengineer-py/refs/tags/{VERSION}/src/fabricengineer/import_module/import_module.py"
resp = requests.get(url)
code = resp.text

exec(code, globals())  # This provides the 'import_module' function
assert code.startswith("import requests")

# Cell 2
mlv_module = import_module("transform.mlv", VERSION)
scd2_module = import_module("transform.silver.scd2", VERSION)
insertonly_module = import_module("transform.silver.insertonly", VERSION)

# Cell 3 - Use mlv module
exec(mlv_module, globals())  # Provides MaterializedLakeView class and mlv instance

mlv.init(
    lakehouse="SilverBusinessLakehouse",
    schema="schema",
    table="projects"
)
print(mlv.to_dict())

# Cell 4 - Use scd2 module
exec(scd2_module, globals())  # Provides an instantiated etl object

etl.init(...)
print(str(etl))

# Cell 5 - Use insertonly module
exec(insertonly_module, globals())  # Provides an instantiated etl object

etl.init(...)
print(str(etl))

Advanced Features

Performance Optimization

Broadcast Joins: Automatically optimize small table joins
Partition Strategies: Intelligent partitioning for better query performance
Schema Evolution: Handle schema changes without breaking existing pipelines
Delta Load Processing: Efficient incremental data processing

Data Quality & Validation

Automatic Validation: Built-in checks for data consistency and quality
Type Safety: Comprehensive type annotations for better development experience
Error Handling: Robust error handling and recovery mechanisms

Monitoring & Logging

from fabricengineer.logging import TimeLogger, logger

# Performance monitoring
timer = TimeLogger()
timer.start().log()

# Your ETL operations here
etl.run()

timer.end().log()

# Custom fabricengineer logging
logger.info("Custom log message")
logger.error("Error occurred during processing")

Project details

Release history Release notifications | RSS feed

1.1.0

Aug 26, 2025

1.0.6

Aug 26, 2025

1.0.5

Aug 26, 2025

1.0.4

Aug 26, 2025

1.0.3

Aug 15, 2025

1.0.2

Aug 15, 2025

1.0.1

Aug 14, 2025

0.2.0

Aug 11, 2025

0.1.13

Aug 11, 2025

0.1.12

Aug 11, 2025

0.1.11

Aug 11, 2025

0.1.10

Aug 11, 2025

0.1.9

Aug 8, 2025

0.1.8

Aug 8, 2025

0.1.7

Aug 8, 2025

0.1.6

Aug 8, 2025

0.1.5

Aug 8, 2025

This version

0.1.4

Aug 7, 2025

0.1.3

Aug 7, 2025

0.1.2

Aug 7, 2025

0.1.1

Aug 7, 2025

0.1.0

Aug 7, 2025

0.0.10

Aug 5, 2025

0.0.9

Aug 5, 2025

0.0.8

Aug 4, 2025

0.0.7

Aug 4, 2025

0.0.6

Aug 4, 2025

0.0.5

Aug 4, 2025

0.0.4

Aug 4, 2025

0.0.3

Jul 24, 2025

0.0.2

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fabricengineer_py-0.1.4.tar.gz (86.1 kB view details)

Uploaded Aug 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fabricengineer_py-0.1.4-py3-none-any.whl (24.1 kB view details)

Uploaded Aug 7, 2025 Python 3

File details

Details for the file fabricengineer_py-0.1.4.tar.gz.

File metadata

Download URL: fabricengineer_py-0.1.4.tar.gz
Upload date: Aug 7, 2025
Size: 86.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fabricengineer_py-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`fac56fa6d4c5406ff686aec980963650868be7a0b68e2449f786555bf4a01389`
MD5	`16d90efceeedf7c160d85edac54792e8`
BLAKE2b-256	`942eb69913febedc4699d3d854784e21145526af0875d7ee89c39dbdc1316839`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fabricengineer_py-0.1.4.tar.gz:

Publisher: release.yml on enricogoerlitz/fabricengineer-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fabricengineer_py-0.1.4.tar.gz
- Subject digest: fac56fa6d4c5406ff686aec980963650868be7a0b68e2449f786555bf4a01389
- Sigstore transparency entry: 362309373
- Sigstore integration time: Aug 7, 2025
Source repository:
- Permalink: enricogoerlitz/fabricengineer-py@e7df2b0d686d84029b133ca81e66d68ad6e1e639
- Branch / Tag: refs/tags/0.1.4
- Owner: https://github.com/enricogoerlitz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e7df2b0d686d84029b133ca81e66d68ad6e1e639
- Trigger Event: push

File details

Details for the file fabricengineer_py-0.1.4-py3-none-any.whl.

File metadata

Download URL: fabricengineer_py-0.1.4-py3-none-any.whl
Upload date: Aug 7, 2025
Size: 24.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fabricengineer_py-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9757b26e3ff4f14593e06ab1da115e465fc4546317e4268af966b93208e28d09`
MD5	`443357a85161a72379605b3cd92bc81c`
BLAKE2b-256	`ad88a7a257a049ddbb74503843530ad91bb89861a62669eab6abdd745b7b7a3d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fabricengineer_py-0.1.4-py3-none-any.whl:

Publisher: release.yml on enricogoerlitz/fabricengineer-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fabricengineer_py-0.1.4-py3-none-any.whl
- Subject digest: 9757b26e3ff4f14593e06ab1da115e465fc4546317e4268af966b93208e28d09
- Sigstore transparency entry: 362309377
- Sigstore integration time: Aug 7, 2025
Source repository:
- Permalink: enricogoerlitz/fabricengineer-py@e7df2b0d686d84029b133ca81e66d68ad6e1e639
- Branch / Tag: refs/tags/0.1.4
- Owner: https://github.com/enricogoerlitz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e7df2b0d686d84029b133ca81e66d68ad6e1e639
- Trigger Event: push

fabricengineer-py 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

FabricEngineer Package

Description

Key Features

Installation

Quick Start Guide

Prerequisites

Usage Examples

Silver Layer Data Ingestion

Insert-Only Pattern

SCD Type 2 (Slowly Changing Dimensions)

Materialized Lake Views Management

Remote Module Import for Fabric Notebooks

Advanced Features

Performance Optimization

Data Quality & Validation

Monitoring & Logging

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance