Skip to main content

A library providing a standardised data quality tracking in lakehouse environments. Designed for use within Databricks, where PySpark is already available.

Project description

sharepointlib_dq

Package Description

Library that provides data quality control for the SharePointLib library. This package is designed to be used within the Databricks platform, where PySpark is already available and does not need to be installed as an additional dependency.

Usage

Lakehouse DQ Logs

From a SQL script:

-- Create the Delta lakehouse table for tracking SharePoint file uploads
-- Includes metadata such as file details, user info, and processing status
-- Table (example): workspace.default.sharepoint_uploader_monitoring_logs
DROP TABLE IF EXISTS workspace.default.sharepoint_uploader_monitoring_logs;
CREATE TABLE workspace.default.sharepoint_uploader_monitoring_logs (
  target STRING,
  key STRING,
  input_file_name STRING,
  file_name STRING,
  user_name STRING,
  user_email STRING,
  modify_date STRING,     -- Should be timestamp
  file_size STRING,       -- Should be INT
  file_row_count STRING,  -- Should be INT
  status STRING,
  rejection_reason STRING,
  file_web_url STRING
)
USING DELTA;

From a Python script:

# Error code descriptions
from sharepointlib_dq.core import DQStatusCode

print(DQStatusCode.get_description("SchemaMismatch"))  # DQ FAIL: SCHEMA MISMATCH
print(DQStatusCode.get_description("UnknownCode"))     # UNKNOWN STATUS CODE

Status Code Table

Code Description
NA NOT APPLICABLE
EmptyFile DQ FAIL: EMPTY FILE
SchemaMismatch DQ FAIL: SCHEMA MISMATCH
SchemaMismatchAndEmptyFile DQ FAIL: SCHEMA MISMATCH AND EMPTY FILE
InvalidNumericFormat DQ FAIL: INVALID NUMERIC FORMAT
InvalidDateFormat DQ FAIL: INVALID DATE FORMAT
# Metadata and DQ Writer
from sharepointlib_dq.core import SPFileMetadata
from sharepointlib_dq.core import DQWriter

# Simulated SharePoint metadata for a Customer report
meta_dict = {
    "alias": "Customer_Report.xlsx",
    "name": "Customer_Report.xlsx",
    "size": 204800,
    "path": "/SharePoint/Customers",
    "web_url": "https://contoso.sharepoint.com/sites/customers/Customer_Report.xlsx",
    "last_modified_date_time": "2025-11-24T10:30:00Z",
    "last_modified_by_name": "Peter Parker",
    "last_modified_by_email": "peter.parker@contoso.com",
    "id": "file_987"
}

# Init metadata object
metadata = SPFileMetadata(alias="My_File.xlsx", sheet="Customers")

# Build the metadata object (meta_dict -> metadata)
metadata.from_dict(d=meta_dict)

# Add the row_count (number of rows in the file)
metadata.row_count = 1500

# Initialise the writer with the existing STRING-only Delta table
dq_writer = DQWriter(table_name="workspace.default.sharepoint_uploader_monitoring_logs")

# Success case (no rejection reason -> status = SUCCESS)
dq_writer.write_metadata(metadata=metadata)

# Failure case (rejection reason provided -> status = FAIL)
dq_writer.write_metadata(metadata=metadata, reason="DQ FAIL: EMPTY FILE")

Email Notifications

from sharepointlib_dq.notifier import EmailNotifier

# Init email notifications
notifier = EmailNotifier(
    client_id=client_id,
    client_secret=client_secret,
    tenant_id=tenant_id,
    sender_email="sender@example.com"
)

# Send email notification
status_code = notifier.send_email(
    recipients=["peter.parker@example.com"],
    subject="Notification X",
    message="This is<br>a test...",
    attachments=None
)
if status_code in (200, 202):
    print("Email sent")
# Using pre-defined templates
from sharepointlib_dq.notifier import EmailTemplates

# Parameters
recipient_name = "Peter Parker"
error_message = "The web is crashing."

# Generate an email using the technical template
message = EmailTemplates.technical_error(
    recipient_name,
    error_message
)

# Print result
print(message)  # Dear Peter Parker...

Installation

Install python and pip if you have not already.

Then run:

pip install pip --upgrade

For production:

pip install sharepointlib_dq

This will install the package and all of it's python dependencies.

If you want to install the project for development:

git clone https://github.com/aghuttun/sharepointlib_dq.git
cd sharepointlib_dq
pip install -e ".[dev]"

Docstring

The script's docstrings follow the numpydoc style.

License

BSD License (see license file)

top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sharepointlib_dq-0.0.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sharepointlib_dq-0.0.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file sharepointlib_dq-0.0.1.tar.gz.

File metadata

  • Download URL: sharepointlib_dq-0.0.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sharepointlib_dq-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4435c82379c63040275c094b85650791cf30f40251f65297fb8451fd9c13720e
MD5 f3d3148540cc0d5df7a8b352998bc2f1
BLAKE2b-256 23b2194cc31b3a44b0bbebbecd8a19d28da31b9b8f53bef8a558f9f594044a9e

See more details on using hashes here.

File details

Details for the file sharepointlib_dq-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sharepointlib_dq-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7ba9f17f7abb46c6f9e7bfc35e12020ba22ba6e821c86706d0937e80a92f82e2
MD5 7c44e4e2fbb7cb0d3353b12cf727f89a
BLAKE2b-256 de3785547bc5077797df378a91a007a4249269925747c2aef5a30d8455b06faa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page