A Python-based library implementing data mesh concepts.
Project description
PyDataMesh
PyDataMesh is a Python-based library designed to implement core concepts of data mesh architecture. It facilitates decentralised data ownership, robust data governance, and a unified catalog for managing metadata. This library enables teams to manage their data domains, create data products, and enforce policies while adhering to data mesh principles.
Key Features
1. Data Domains
- Encapsulate data ownership within logical boundaries.
- Maintain collections of data products specific to a domain.
2. Data Products
- Self-contained, discoverable datasets.
- Query datasets using flexible filters.
- Provide metadata to enhance discoverability and governance.
3. Unity Catalog Integration
- Manage metadata for data products.
- Register, retrieve, and list metadata for enhanced context.
4. Data Platform
- Facilitate the addition and management of domains and data products.
- Provide a centralised interface for accessing decentralised data.
5. Federated Governance
- Apply governance policies across domains to ensure compliance.
- Example: Masking sensitive data or enforcing specific access rules.
Installation
To install PyDataMesh, clone the repository and add it to your project:
# Clone the repository
git clone https://github.com/vpdeva/PyDataMesh.git
# Navigate to the folder
cd PyDataMesh
Example Usage
Here is an example of how to use PyDataMesh:
import pandas as pd
from core.data_mesh import PyDataMesh
if __name__ == "__main__":
data_mesh = PyDataMesh()
# Create domains
data_mesh.create_domain("Sales")
data_mesh.create_domain("HR")
# Add data products
sales_data = pd.DataFrame({"order_id": [1, 2, 3], "amount": [100, 200, 300]})
hr_data = pd.DataFrame({"employee_id": [1, 2], "name": ["Alice", "Bob"]})
data_mesh.add_data_product("Sales", "SalesData", sales_data, metadata={"owner": "Sales Team", "description": "Sales transaction data."})
data_mesh.add_data_product("HR", "EmployeeData", hr_data, metadata={"owner": "HR Team", "description": "Employee information."})
# Query data products
print(data_mesh.query_data_product("Sales", "SalesData", {"amount": 200}))
# Add governance policy
def mask_amount(data):
data["amount"] = data["amount"].apply(lambda x: "MASKED" if x > 150 else x)
return data
data_mesh.add_governance_policy("Sales", mask_amount)
# Get data with policies
print(data_mesh.get_data_with_policies("Sales", "SalesData"))
# Retrieve metadata
print(data_mesh.get_product_metadata("Sales", "SalesData"))
# List all registered products
print(data_mesh.list_registered_products())
Folder Structure
PyDataMesh/
├── core/
│ └── data_mesh.py # Core library implementation
├── examples/
│ └── example_usage.py # Example usage of the library
└── README.md # Library documentation
Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or fixes.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydatamesh-1.0.0.tar.gz.
File metadata
- Download URL: pydatamesh-1.0.0.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a00c0c46c09b31286e05a661ac569aa871e6bfeaf4d3f55409907a24f9a3d736
|
|
| MD5 |
3b9e2b8eb073bc6f1cb501cf54f8b520
|
|
| BLAKE2b-256 |
e05578314217cb675591c225a64814ac01c0592fd966835c9d7a54435926d177
|
File details
Details for the file PyDataMesh-1.0.0-py3-none-any.whl.
File metadata
- Download URL: PyDataMesh-1.0.0-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0aec3641f01fc00e003c280ae331234eb4a50530f8ce7f8a70f4d8f4be1db4e9
|
|
| MD5 |
cfe4f05829386b30dbb9251df1e6c28d
|
|
| BLAKE2b-256 |
dd772f3121469f330ff55cb3d0139fb83898fec4cdb1c8eade582d0345d1d114
|