Skip to main content

A dependency-free audit trail system for data transformations.

Project description

CleanCore Python

A lightweight, dependency-free audit trail system for your data pipelines.

CleanCore automatically creates immutable, row-level audit logs for every data transformation. It's the simplest way to add compliance, debuggability, and provenance tracking to your data cleaning scripts.

GitHub License: MIT


✨ Why CleanCore?

  • 🔍 Full Audit Trail — Automatically logs what changed, which rows were affected, and why
  • 🚫 Zero Dependencies — Pure Python, works anywhere
  • ⚖️ Compliance-Ready — JSON logs for GDPR, HIPAA, and internal audits
  • 🐛 Debugging Superpower — Trace errors to exact rows and steps
  • 🔧 Simple Integration — One decorator audits any function (lists, dicts, CSVs)

🚀 Quick Start

Installation

pip install cleancore-python
Basic Usage: Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report

@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
    result = []
    for row in data:
        new_row = row.copy()
        if '@' in new_row.get('email', ''):
            new_row['email'] = '***@***.***'
        result.append(new_row)
    return result

logger = ProvenaLogger("Single_Transformation")

sample_data = [
    {'id': 1, 'email': 'test@example.com'},
    {'id': 2, 'email': 'user'}
]

cleaned_data = clean_emails(sample_data, provena_logger=logger)

print(generate_terminal_report(logger))
Advanced Usage: Audit a Complete Pipeline
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv

def load_data(filepath):
    with open(filepath) as f:
        return list(csv.DictReader(f))

@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
    return data

@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
    return data

with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
    data = load_data("customers.csv")
    data = standardize_names(data, provena_logger=logger)
    data = fill_missing(data, provena_logger=logger)

logger.export_json("customer_pipeline_audit.json")
📋 Example Audit Report Output
yaml
Copy code
🚀 PROVENA AUDIT REPORT: Customer_Onboarding_Pipeline
======================================================================
📊 SUMMARY
    Steps: 2
    Total Changes: 150 rows
    Started: 2024-01-15T10:30:00
----------------------------------------------------------------------
[1]  standardize_names
    Status: SUCCESS
    Rule: STANDARDIZE_NAMES
    Rows: 10,000  10,000
    Changed: 120 rows
    Sample: Row 42: '  JOHN DOE  '  'john doe'

[2]  fill_missing
    Status: SUCCESS
    Rule: FILL_MISSING_VALUES
    Rows: 10,000  10,000
    Changed: 30 rows
    Sample: Row 101: 'age' = None  34
======================================================================
📁 Export: provena export Customer_Onboarding_Pipeline.json
======================================================================
📁 Project Structure & API
audit_trail  Decorator for tracking transformations

ProvenaLogger  Core audit logger

audit_pipeline  Context manager for pipelines

generate_terminal_report()  Human-readable console report

logger.export_json("audit.json")  Persist audit logs

🤝 Contributing & Support
CleanCore is fully open source and welcomes contributions.

GitHub Repository & Issues
https://github.com/Sidra-009/cleancore-python-library

Found a bug or have an idea?
Open an issue on GitHub.

Want to contribute?
Fork the repo and submit a pull request.

📄 License
This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleancore_python-0.1.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleancore_python-0.1.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file cleancore_python-0.1.0.tar.gz.

File metadata

  • Download URL: cleancore_python-0.1.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cleancore_python-0.1.0.tar.gz
Algorithm Hash digest
SHA256 53861d51a8b64be47d9c0458d57a6766a1421b64ef8f42b6aa4a9dde7cf24771
MD5 705e14376d4b9fd292b8e94a5fedc802
BLAKE2b-256 7946041750bf4b76d0abd5c9d434d42567c6b49de4e5726e36c3a671deb83526

See more details on using hashes here.

File details

Details for the file cleancore_python-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cleancore_python-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 817ce7b8eebb5d7c8ed12a1ea09d360f0f5f814ff07a37b7815491ce939e59c7
MD5 1bd8b1377bf71def62c49a189c9d5788
BLAKE2b-256 b53a0e0dca669dca219415e706cb818498c246af134cc61cedff7db06984b85a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page