SDMF - Standard Data Management Framework
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
Standard Data Management Framework (SDMF)
A modular, scalable, and Python-based Data Management Framework designed to standardize data ingestion, validation, transformation, metadata handling, and storage across enterprise workflows.
This framework eliminates repetitive boilerplate and provides a consistent structure for building reliable, maintainable data pipelines.
✅ Key Features
- Modular Design – Plug-and-play components for ingestion, validation, transformation, and storage.
- Schema Alignment & Partitioning – Built-in support for CDC (Change Data Capture) and MERGE operations.
- Metadata Management – Centralized handling of feed specifications and lineage.
- Scalable – Works seamlessly with Spark, Delta Lake, and distributed environments like Databricks.
- Logging & Monitoring – Custom logging with retention and rotation policies.
📂 Project Structure
sdmf/
├── cli/ # Command-line interface for orchestration
├── config/ # Configurations (logging, paths, retention)
├── orchestrator/ # Pipeline orchestration logic
├── result_generator/ # Excel/Report generation utilities
├── utils/ # Helper functions
└── ...
⚙️ Installation
Option 1 (Recommended): Editable Install
From the project root (where pyproject.toml is located):
pip install -e .
python -m build
Then run:
python -m sdmf.cli.main
🔗 Dependencies
Install required packages:
pip install pyspark==3.5.1 delta-spark==3.1.0
🚀 Usage
Run the main orchestrator:
python -m sdmf.cli.main --config config/config.ini --run_id <unique_run_id>
🛠 Configuration
Update config.ini:
[DEFAULT]
outbound_directory_name=sdmf_outbound
log_directory_name=sdmf_logs
temp_log_location=/tmp/
file_hunt_path=/dbfs/FileStore/sdmf/
log_retention_policy_in_days=7
[FILES]
master_spec_name=master_specs.xlsx
✅ Logging
- Logs are first written to
/tmp/sdmf_logsfor speed. - After job completion, logs are moved to the final directory (
file_hunt_path). - Automatic cleanup of logs older than 7 days.
✅ Best Practices
- Use editable install for development.
- Keep configs modular for different environments (Dev, QA, Prod).
- Ensure DBFS or UC volumes for persistent storage in Databricks.
📌 Next Steps
- Add unit tests for core modules.
- Integrate structured logging (JSON) for ELK/Splunk.
- Enable compression for archived logs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sdmf-0.1.0.tar.gz.
File metadata
- Download URL: sdmf-0.1.0.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
585f665df8f0ff9dbbd3adb2f24cede3312ff1dde882308fdefa9b2c29dcd880
|
|
| MD5 |
c11e8eeddd6296358e61252496018bec
|
|
| BLAKE2b-256 |
6b9bf425e87d5c7d7756590f15786914de1248d0abc589c9e419f3bd2373e77c
|
File details
Details for the file sdmf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sdmf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 59.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa41e3e57a235374783e6973e41620078a034866fe9d1beb1036606fa36c3971
|
|
| MD5 |
966ab870cfef53b75852f1dfc1da4964
|
|
| BLAKE2b-256 |
ad3c34b5bab4276d309c683757cb8b96306ea5f7f66e9ae3a44423ac6db09f55
|