Skip to main content

A Python-based library implementing data mesh concepts.

Project description

PyDataMesh

PyDataMesh is a Python-based library designed to implement core concepts of data mesh architecture. It facilitates decentralised data ownership, robust data governance, and a unified catalog for managing metadata. This library enables teams to manage their data domains, create data products, and enforce policies while adhering to data mesh principles.

Key Features

1. Data Domains

  • Encapsulate data ownership within logical boundaries.
  • Maintain collections of data products specific to a domain.

2. Data Products

  • Self-contained, discoverable datasets.
  • Query datasets using flexible filters.
  • Provide metadata to enhance discoverability and governance.

3. Unity Catalog Integration

  • Manage metadata for data products.
  • Register, retrieve, and list metadata for enhanced context.

4. Data Platform

  • Facilitate the addition and management of domains and data products.
  • Provide a centralised interface for accessing decentralised data.

5. Federated Governance

  • Apply governance policies across domains to ensure compliance.
  • Example: Masking sensitive data or enforcing specific access rules.

Installation

To install PyDataMesh, clone the repository and add it to your project:

# Clone the repository
git clone https://github.com/vpdeva/PyDataMesh.git

# Navigate to the folder
cd PyDataMesh

Example Usage

Here is an example of how to use PyDataMesh:

import pandas as pd
from core.data_mesh import PyDataMesh

if __name__ == "__main__":
    data_mesh = PyDataMesh()

    # Create domains
    data_mesh.create_domain("Sales")
    data_mesh.create_domain("HR")

    # Add data products
    sales_data = pd.DataFrame({"order_id": [1, 2, 3], "amount": [100, 200, 300]})
    hr_data = pd.DataFrame({"employee_id": [1, 2], "name": ["Alice", "Bob"]})

    data_mesh.add_data_product("Sales", "SalesData", sales_data, metadata={"owner": "Sales Team", "description": "Sales transaction data."})
    data_mesh.add_data_product("HR", "EmployeeData", hr_data, metadata={"owner": "HR Team", "description": "Employee information."})

    # Query data products
    print(data_mesh.query_data_product("Sales", "SalesData", {"amount": 200}))

    # Add governance policy
    def mask_amount(data):
        data["amount"] = data["amount"].apply(lambda x: "MASKED" if x > 150 else x)
        return data

    data_mesh.add_governance_policy("Sales", mask_amount)

    # Get data with policies
    print(data_mesh.get_data_with_policies("Sales", "SalesData"))

    # Retrieve metadata
    print(data_mesh.get_product_metadata("Sales", "SalesData"))

    # List all registered products
    print(data_mesh.list_registered_products())

Folder Structure

PyDataMesh/
├── core/
│   └── data_mesh.py  # Core library implementation
├── examples/
│   └── example_usage.py  # Example usage of the library
└── README.md  # Library documentation

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydatamesh-1.0.0.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyDataMesh-1.0.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file pydatamesh-1.0.0.tar.gz.

File metadata

  • Download URL: pydatamesh-1.0.0.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for pydatamesh-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a00c0c46c09b31286e05a661ac569aa871e6bfeaf4d3f55409907a24f9a3d736
MD5 3b9e2b8eb073bc6f1cb501cf54f8b520
BLAKE2b-256 e05578314217cb675591c225a64814ac01c0592fd966835c9d7a54435926d177

See more details on using hashes here.

File details

Details for the file PyDataMesh-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: PyDataMesh-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for PyDataMesh-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0aec3641f01fc00e003c280ae331234eb4a50530f8ce7f8a70f4d8f4be1db4e9
MD5 cfe4f05829386b30dbb9251df1e6c28d
BLAKE2b-256 dd772f3121469f330ff55cb3d0139fb83898fec4cdb1c8eade582d0345d1d114

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page