Skip to main content

A library to simplify and enhance data governance and compliance processes

Project description

Data Governance Library

The Data Governance Library is a Python package designed to help organizations maintain data compliance, security, and governance standards. With features tailored to address common regulatory frameworks like GDPR, HIPAA, ISO 27001, and more, this library provides tools for automated compliance checks, metadata management, data lineage tracking, and role-based access control auditing.


Features

  1. Automated Compliance Checks

    • Support for frameworks like GDPR, HIPAA, ISO 27001, and CCPA.
    • Customizable rules for new compliance standards.
  2. Data Lineage Tracking

    • Monitor data flow and identify its origin, transformation, and destination.
  3. Role-Based Access Control (RBAC) Auditing

    • Ensure that data access policies are adhered to.
  4. Metadata Management and Cataloging

    • Store, manage, and query metadata associated with your datasets.
  5. Data Masking and Anonymization

    • Protect sensitive data with masking and anonymization techniques.

Installation

Install the library via pip:

pip install data_governance_checkup

Usage

Example: Running Compliance Checks

from data_governance_checkup.compliance import ComplianceFrameworkValidator

# Initialize the compliance framework validator
validator = ComplianceFrameworkValidator()

# GDPR Data Example
gdpr_data = {"name": "John Doe", "email": "john.doe@example.com"}
print("GDPR Compliance:", validator.validate("GDPR", gdpr_data))

# HIPAA Data Example
hipaa_data = {
    "security_log": [
        {"user_id": "12345", "access_time": "2025-01-14T12:00:00Z"},
        {"user_id": "67890", "access_time": "2025-01-14T13:00:00Z"}
    ]
}
print("HIPAA Compliance:", validator.validate("HIPAA", hipaa_data))

# CCPA Data Example
ccpa_opt_out_data = {"data_sales_opt_out": True}
ccpa_deletion_log = [
    {"request_id": "1", "status": "completed"},
    {"request_id": "2", "status": "completed"}
]
# Validate CCPA opt-out compliance
print("CCPA Compliance (Opt-Out):", validator.validate("CCPA", ccpa_opt_out_data, validation_type="opt_out"))
# Validate CCPA deletion request compliance
print("CCPA Compliance (Deletion Requests):", validator.validate("CCPA", ccpa_deletion_log, validation_type="deletion_requests"))

# ISO27001 Data Example (Fixing structure)
iso27001_access_logs = [
    {"user_id": "12345", "access_time": "2025-01-14T12:00:00Z"},
    {"user_id": "67890", "access_time": "2025-01-14T13:00:00Z"}
]
iso27001_risk_assessment_report = [
    {"risk_id": "R1", "description": "Risk 1", "mitigation_plan": "Plan A"},
    {"risk_id": "R2", "description": "Risk 2", "mitigation_plan": "Plan B"}
]
print("ISO27001 Compliance (Access Control):", validator.validate("ISO27001", iso27001_access_logs, validation_type="access_control"))
print("ISO27001 Compliance (Risk Assessment):", validator.validate("ISO27001", iso27001_risk_assessment_report, validation_type="risk_assessment"))

Data Lineage Tracking

import json
from data_governance_checkup.lineage.lineage import DataLineageTracker

# Initialize the lineage tracker
lineage_tracker = DataLineageTracker()

# Sample dataset
sample_data = [
    {"order_id": 1, "customer_id": 101, "status": "completed", "amount": 250},
    {"order_id": 2, "customer_id": 102, "status": "pending", "amount": 300},
    {"order_id": 3, "customer_id": 101, "status": "completed", "amount": 400},
    {"order_id": 4, "customer_id": 103, "status": "cancelled", "amount": 150},
]

customer_data = [
    {"customer_id": 101, "name": "Alice"},
    {"customer_id": 102, "name": "Bob"},
    {"customer_id": 103, "name": "Charlie"},
]

# Step 1: Add source details
lineage_tracker.add_source(
    data_id="order_data",
    source_details={"type": "JSON", "description": "Order details JSON data"}
)

# Step 2: Apply a transformation - Filter completed orders
filtered_data = [row for row in sample_data if row["status"] == "completed"]
lineage_tracker.add_transformation(
    data_id="order_data",
    transformation="Filtered rows where 'status' = 'completed'"
)

# Step 3: Apply a transformation - Join with customer data
joined_data = [
    {
        "order_id": row["order_id"],
        "customer_name": next(
            (customer["name"] for customer in customer_data if customer["customer_id"] == row["customer_id"]), 
            None
        ),
        "amount": row["amount"]
    }
    for row in filtered_data
]
lineage_tracker.add_transformation(
    data_id="order_data",
    transformation="Joined with 'customer_data' on 'customer_id'"
)

# Step 4: Set destination
lineage_tracker.set_destination(
    data_id="order_data",
    destination_details={"type": "JSON", "description": "Processed data"}
)

# Print the processed data
print("Processed Data:")
print(json.dumps(joined_data, indent=4))

# Retrieve and print the data lineage
lineage = lineage_tracker.get_lineage("order_data")
print("\nData Lineage:")
print(json.dumps(lineage, indent=4))

# Export lineage to a file
lineage_tracker.export_lineage("lineage_output.json")
print("\nLineage data exported to 'lineage_output.json'")

Metadata Management

from data_governance_checkup.metadata import MetadataManager

# Initialize the metadata manager
metadata_manager = MetadataManager()

# Add metadata
metadata_manager.add_metadata("resource_1", {"owner": "Alice", "created_at": "2025-01-14"})
metadata_manager.add_metadata("resource_2", {"owner": "Bob", "created_at": "2025-01-13"})

# Update metadata
metadata_manager.update_metadata("resource_1", {"last_accessed": "2025-01-14"})

# Retrieve metadata
print("Metadata for resource_1:", metadata_manager.get_metadata("resource_1"))

# List all metadata
print("All metadata:", metadata_manager.list_all_metadata())

# Save metadata to a file
metadata_manager.save_metadata_to_file("metadata.json")

# Load metadata from a file
metadata_manager.load_metadata_from_file("metadata.json")
print("Metadata after loading from file:", metadata_manager.list_all_metadata())

Role-Based Access Control (RBAC) Auditing

from data_governance_checkup.rbac import RBACManager

# Initialize RBAC Manager
rbac = RBACManager()

# Create roles
rbac.create_role("Admin")
rbac.create_role("Editor")
rbac.create_role("Viewer")

# Assign permissions to roles
rbac.assign_permission_to_role("Admin", "delete_data")
rbac.assign_permission_to_role("Admin", "edit_data")
rbac.assign_permission_to_role("Editor", "edit_data")
rbac.assign_permission_to_role("Viewer", "view_data")

# Create users
rbac.create_user("alice")
rbac.create_user("bob")

# Assign roles to users
rbac.assign_role_to_user("alice", "Admin")
rbac.assign_role_to_user("bob", "Viewer")

# Check permissions
print(f"Alice has 'edit_data' permission: {rbac.has_permission('alice', 'edit_data')}")
print(f"Bob has 'delete_data' permission: {rbac.has_permission('bob', 'delete_data')}")

# Get all permissions for a user
print(f"Alice's permissions: {rbac.get_user_permissions('alice')}")
print(f"Bob's permissions: {rbac.get_user_permissions('bob')}")

# Revoke a role
rbac.revoke_role_from_user("alice", "Admin")
print(f"Alice's permissions after revoking Admin role: {rbac.get_user_permissions('alice')}")

Data Masking and Anonymization

from data_governance_checkup.masking import DataMasking

# Example usage
data = {
    "name": "John Doe",
    "email": "john.doe@example.com",
    "ssn": "123-45-6789",
}

mask_fields = ["email", "ssn"]
masker = DataMasking()
masked_data = masker.mask_data(data, mask_fields)

print("Original Data:", data)
print("Masked Data:", masked_data)

Contributing

Contributions are welcome! Please submit pull requests or open issues for any enhancements, bugs, or additional compliance frameworks you'd like to see.


License

This project is licensed under the MIT License. See the LICENSE file for more details.


Roadmap

  1. Add support for more compliance standards (e.g., SOC 2, PCI DSS).
  2. Build visualization dashboards for compliance status.
  3. Integrate with real-time data pipelines for live compliance checks.

Contact

For questions or support, please contact pratik.lahudkar@gmail or open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_governance_checkup-0.1.2.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_governance_checkup-0.1.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file data_governance_checkup-0.1.2.tar.gz.

File metadata

  • Download URL: data_governance_checkup-0.1.2.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for data_governance_checkup-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d03436d5aa5a95ecc466d0a387e01d058a0c4ba915b1c1b55ea7d4dee91dc8ea
MD5 da70f916a07fa3ef111d2c5b5a7f7e49
BLAKE2b-256 0a4975a113c67417b467748014c96c85f02fee000bfdbbca1d4f397a3149b145

See more details on using hashes here.

File details

Details for the file data_governance_checkup-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for data_governance_checkup-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 457407809b6648926ce080a9ac4e0b21da1860634ddd0b1174dd72315e4b7346
MD5 929388d2841d9c4ef9b7f9c6a1b6da6d
BLAKE2b-256 ba1e1116e0c500dec150b5eae89b02c4ab54ef12046837cb85ce436249712cb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page