Skip to main content

SDMF - Standard Data Management Framework

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Standard Data Management Framework (SDMF)

A modular, scalable, and Python-based Data Management Framework designed to standardize data ingestion, validation, transformation, metadata handling, and storage across enterprise workflows.

This framework eliminates repetitive boilerplate and provides a consistent structure for building reliable, maintainable data pipelines.


Key Features

  • Modular Design – Plug-and-play components for ingestion, validation, transformation, and storage.
  • Schema Alignment & Partitioning – Built-in support for CDC (Change Data Capture) and MERGE operations.
  • Metadata Management – Centralized handling of feed specifications and lineage.
  • Scalable – Works seamlessly with Spark, Delta Lake, and distributed environments like Databricks.
  • Logging & Monitoring – Custom logging with retention and rotation policies.

📂 Project Structure

sdmf/
├── cli/                # Command-line interface for orchestration
├── config/             # Configurations (logging, paths, retention)
├── orchestrator/       # Pipeline orchestration logic
├── result_generator/   # Excel/Report generation utilities
├── utils/              # Helper functions
└── ...

⚙️ Installation

Option 1 (Recommended): Editable Install

From the project root (where pyproject.toml is located):

pip install -e .
python -m build

Then run:

python -m sdmf.cli.main

🔗 Dependencies

Install required packages:

pip install pyspark==3.5.1 delta-spark==3.1.0

🚀 Usage

Run the main orchestrator:

python -m sdmf.cli.main --config config/config.ini --run_id <unique_run_id>

🛠 Configuration

Update config.ini:

[DEFAULT]
outbound_directory_name=sdmf_outbound
log_directory_name=sdmf_logs
temp_log_location=/tmp/
file_hunt_path=/dbfs/FileStore/sdmf/
log_retention_policy_in_days=7

[FILES]
master_spec_name=master_specs.xlsx

Logging

  • Logs are first written to /tmp/sdmf_logs for speed.
  • After job completion, logs are moved to the final directory (file_hunt_path).
  • Automatic cleanup of logs older than 7 days.

Best Practices

  • Use editable install for development.
  • Keep configs modular for different environments (Dev, QA, Prod).
  • Ensure DBFS or UC volumes for persistent storage in Databricks.

📌 Next Steps

  • Add unit tests for core modules.
  • Integrate structured logging (JSON) for ELK/Splunk.
  • Enable compression for archived logs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdmf-0.1.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdmf-0.1.0-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file sdmf-0.1.0.tar.gz.

File metadata

  • Download URL: sdmf-0.1.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for sdmf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 585f665df8f0ff9dbbd3adb2f24cede3312ff1dde882308fdefa9b2c29dcd880
MD5 c11e8eeddd6296358e61252496018bec
BLAKE2b-256 6b9bf425e87d5c7d7756590f15786914de1248d0abc589c9e419f3bd2373e77c

See more details on using hashes here.

File details

Details for the file sdmf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sdmf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for sdmf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa41e3e57a235374783e6973e41620078a034866fe9d1beb1036606fa36c3971
MD5 966ab870cfef53b75852f1dfc1da4964
BLAKE2b-256 ad3c34b5bab4276d309c683757cb8b96306ea5f7f66e9ae3a44423ac6db09f55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page