Skip to main content

Zero-overhead data observability with row-level immutability

Project description

CleanCore Python

A lightweight, dependency-free audit trail system for Python data pipelines.

CleanCore automatically creates immutable, row-level audit logs for every data transformation. It helps with debugging, compliance, and understanding how your data changes across cleaning steps.


Features

  • Automatic row-level audit logs
  • Zero external dependencies (pure Python)
  • JSON-based audit output for compliance and record keeping
  • Works with lists, dictionaries, and CSV-style data
  • Simple decorator-based API

Installation

pip install cleancore-python
Quick Start
Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report

@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
    result = []
    for row in data:
        new_row = row.copy()
        if '@' in new_row.get('email', ''):
            new_row['email'] = '***@***.***'
        result.append(new_row)
    return result

logger = ProvenaLogger("Single_Transformation")

data = [
    {'id': 1, 'email': 'test@example.com'},
    {'id': 2, 'email': 'user'}
]

cleaned = clean_emails(data, provena_logger=logger)

print(generate_terminal_report(logger))
Pipeline Usage
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv

def load_data(path):
    with open(path) as f:
        return list(csv.DictReader(f))

@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
    return data

@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
    return data

with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
    data = load_data("customers.csv")
    data = standardize_names(data, provena_logger=logger)
    data = fill_missing(data, provena_logger=logger)

logger.export_json("customer_pipeline_audit.json")
Output
CleanCore generates a human-readable terminal report and a machine-readable JSON audit log containing:

Transformation name

Rule ID

Rows before and after

Number of changed rows

Sample value changes

Execution timestamps

API Overview
audit_trail  Decorator for auditing functions

ProvenaLogger  Collects audit events

audit_pipeline  Context manager for multi-step pipelines

generate_terminal_report()  Prints terminal summary

export_json()  Saves audit log to file

Source Code
GitHub Repository
https://github.com/Sidra-009/cleancore-python-library

Issues, feature requests, and pull requests are welcome.

License
MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleancore-1.0.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleancore-1.0.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file cleancore-1.0.0.tar.gz.

File metadata

  • Download URL: cleancore-1.0.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cleancore-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4aa64483c559641cc6275eb8ecd71db9038164879e92689aed028c61690c6a0e
MD5 746a25dc9b21d3959b743025d27fa5d8
BLAKE2b-256 f36c3ac818ae38631ff86c5820a193583697713fe39205eb20be1588ba3de8b7

See more details on using hashes here.

File details

Details for the file cleancore-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: cleancore-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cleancore-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8722228732fdfefefd5f158397ffeafb89ebaad068b03af9847edb6481ffdb98
MD5 7c50609662ae39e9cabaa3fc141d5ea1
BLAKE2b-256 e5a902c6efb64888503347c1d2de5738b495a2a697997aee5707326503f07943

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page