Skip to main content

A dependency-free audit trail system for Python data transformations.

Project description

CleanCore Python

A lightweight, dependency-free audit trail system for Python data pipelines.

CleanCore automatically creates immutable, row-level audit logs for every data transformation. It helps with debugging, compliance, and understanding how your data changes across cleaning steps.


Features

  • Automatic row-level audit logs
  • Zero external dependencies (pure Python)
  • JSON-based audit output for compliance and record keeping
  • Works with lists, dictionaries, and CSV-style data
  • Simple decorator-based API

Installation

pip install cleancore-python
Quick Start
Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report

@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
    result = []
    for row in data:
        new_row = row.copy()
        if '@' in new_row.get('email', ''):
            new_row['email'] = '***@***.***'
        result.append(new_row)
    return result

logger = ProvenaLogger("Single_Transformation")

data = [
    {'id': 1, 'email': 'test@example.com'},
    {'id': 2, 'email': 'user'}
]

cleaned = clean_emails(data, provena_logger=logger)

print(generate_terminal_report(logger))
Pipeline Usage
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv

def load_data(path):
    with open(path) as f:
        return list(csv.DictReader(f))

@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
    return data

@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
    return data

with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
    data = load_data("customers.csv")
    data = standardize_names(data, provena_logger=logger)
    data = fill_missing(data, provena_logger=logger)

logger.export_json("customer_pipeline_audit.json")
Output
CleanCore generates a human-readable terminal report and a machine-readable JSON audit log containing:

Transformation name

Rule ID

Rows before and after

Number of changed rows

Sample value changes

Execution timestamps

API Overview
audit_trail  Decorator for auditing functions

ProvenaLogger  Collects audit events

audit_pipeline  Context manager for multi-step pipelines

generate_terminal_report()  Prints terminal summary

export_json()  Saves audit log to file

Source Code
GitHub Repository
https://github.com/Sidra-009/cleancore-python-library

Issues, feature requests, and pull requests are welcome.

License
MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleancore_python-0.1.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleancore_python-0.1.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file cleancore_python-0.1.1.tar.gz.

File metadata

  • Download URL: cleancore_python-0.1.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cleancore_python-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7e98e24ef966bfe05d5e4c40265a5f46066b999039c6f02a0b63b0385def3048
MD5 4bd7dd8dac66d95180406964d9a919f5
BLAKE2b-256 5f4921a77e0037cc499889766a23a19536388c35841d41d6d2ce97a74c1df818

See more details on using hashes here.

File details

Details for the file cleancore_python-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for cleancore_python-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7ff24a0e2b8436e8ecd6d0ef56ba5ca5e6893ca21b91c5aeeac936c58a11780
MD5 837a1844bf361618b9f8360578448e5f
BLAKE2b-256 2dbc0e14b634f200923525cad9fa11af58cc62b297112cc271e545883c364c18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page