A library for data quality checks in Microsoft Fabric using Great Expectations
Project description
FabricDataGuard
FabricDataGuard is a Python library that simplifies data quality checks in Microsoft Fabric using Great Expectations. It provides an easy-to-use interface for data scientists and engineers to perform data quality checks without the need for extensive Great Expectations setup.
Purpose
The main purpose of FabricDataGuard is to:
- Streamline the process of setting up and running data quality checks in Microsoft Fabric
- Provide a wrapper around Great Expectations for easier integration with Fabric workflows
- Enable quick and efficient data validation with minimal setup
Installation
To install FabricDataGuard, use pip:
pip install fabric-data-guard
Usage
Here's a basic example of how to use FabricDataGuard:
from fabric_data_guard import FabricDataGuard
import great_expectations as gx
# Initialize FabricDataGuard
fdg = FabricDataGuard(
datasource_name="MyDataSourceName",
data_asset_name="MyDataAssetName",
#project_root_dir="/lakehouse/default/Files" # This is an optional parameter. Default is set yo your lakehouse filestore
)
# Define data quality checks
fdg.add_expectation([
gx.expectations.ExpectColumnValuesToNotBeNull(column="UserId"),
gx.expectations.ExpectColumnPairValuesAToBeGreaterThanB(
column_A="UpdateDatime",
column_B="CreationDatetime"
),
gx.expectations.ExpectColumnValueLengthsToEqual(
column="PostalCode",
value=5
),
])
# Read your data from your lake is a pysaprk dataframe
df = spark.sql("SELECT * FROM MyLakehouseName.MyDataAssetName")
# Run validation
results = fdg.run_validation(df, unexpected_identifiers=['UserId'])
Customizing Validation Run
The run_validation
function accepts several keyword arguments that allow you to customize its behavior:
1. Display HTML Results:
results = fdg.run_validation(df, display_html=True)
Set display_html=False
to suppress the HTML output (default is True).
2. Custom Target Table:
results = fdg.run_validation(df, table_name="MyCustomResultsTable")
Specify a custom name for the table where results will be stored.
3. Custom Workspace and Lakehouse:
results = fdg.run_validation(df, workspace_name="MyWorkspace", lakehouse_name="MyLakehouse")
By default, it uses the workspace and lakehouse attached to the running notebook. Use these parameters to specify different locations.
4. Notification Settings::
Below an example usage. See checkpoint.py
to check all required arguments for your use case (Microsoft Teams, Slack or Email)
results = fdg.run_validation(df,
slack_notification=True,
slack_webhook="https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
email_notification=True,
email_to="user@example.com",
teams_notification=True,
teams_webhook="https://outlook.office.com/webhook/YOUR/TEAMS/WEBHOOK")
You can combine these options as needed:
results = fdg.run_validation(df,
display_html=True,
table_name="MyCustomResultsTable",
workspace_name="MyWorkspace",
lakehouse_name="MyLakehouse",
slack_notification=True,
slack_webhook="https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
unexpected_identifiers=['UserId', 'TransactionId'])
This flexibility allows you to tailor the validation process to your specific needs and integrate it seamlessly with your existing data quality workflows.
Contributing
Contributions to FabricDataGuard are welcome! If you'd like to contribute:
- Fork the repository
- Create a new branch for your feature
- Implement your changes
- Write or update tests as necessary
- Submit a pull request
Please ensure your code adheres to the project's coding standards and includes appropriate tests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fabric_data_guard-0.0.3.tar.gz
.
File metadata
- Download URL: fabric_data_guard-0.0.3.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.2 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46feedd66650bb8f185482b575eecf291e7bbdd4f443f6fbf577444ff9f35c80 |
|
MD5 | 7d0e522f906ba95b66a14409df1e121e |
|
BLAKE2b-256 | 475f690a5937909b51a95bdadc1478c85cfd809d7b4b4d3585826cbc426efad7 |
File details
Details for the file fabric_data_guard-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: fabric_data_guard-0.0.3-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.2 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f96540f370e9bcaf1ff2dac695ed4dd5df6810cb8442177b32219af5d4bc7cdb |
|
MD5 | 84035b3ef1e7ef154490dcda878845cc |
|
BLAKE2b-256 | 2ef46353b124b5871f148e9d8ca1727f8b0a5122f8d8fc353f9ba4154ce9cc3e |