A dependency-free audit trail system for data transformations.
Project description
CleanCore Python
A lightweight, dependency-free audit trail system for your data pipelines.
CleanCore automatically creates immutable, row-level audit logs for every data transformation. It's the simplest way to add compliance, debuggability, and provenance tracking to your data cleaning scripts.
✨ Why CleanCore?
- 🔍 Full Audit Trail — Automatically logs what changed, which rows were affected, and why
- 🚫 Zero Dependencies — Pure Python, works anywhere
- ⚖️ Compliance-Ready — JSON logs for GDPR, HIPAA, and internal audits
- 🐛 Debugging Superpower — Trace errors to exact rows and steps
- 🔧 Simple Integration — One decorator audits any function (lists, dicts, CSVs)
🚀 Quick Start
Installation
pip install cleancore-python
Basic Usage: Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report
@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
result = []
for row in data:
new_row = row.copy()
if '@' in new_row.get('email', ''):
new_row['email'] = '***@***.***'
result.append(new_row)
return result
logger = ProvenaLogger("Single_Transformation")
sample_data = [
{'id': 1, 'email': 'test@example.com'},
{'id': 2, 'email': 'user'}
]
cleaned_data = clean_emails(sample_data, provena_logger=logger)
print(generate_terminal_report(logger))
Advanced Usage: Audit a Complete Pipeline
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv
def load_data(filepath):
with open(filepath) as f:
return list(csv.DictReader(f))
@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
return data
@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
return data
with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
data = load_data("customers.csv")
data = standardize_names(data, provena_logger=logger)
data = fill_missing(data, provena_logger=logger)
logger.export_json("customer_pipeline_audit.json")
📋 Example Audit Report Output
yaml
Copy code
🚀 PROVENA AUDIT REPORT: Customer_Onboarding_Pipeline
======================================================================
📊 SUMMARY
• Steps: 2
• Total Changes: 150 rows
• Started: 2024-01-15T10:30:00
----------------------------------------------------------------------
[1] ✅ standardize_names
• Status: SUCCESS
• Rule: STANDARDIZE_NAMES
• Rows: 10,000 → 10,000
• Changed: 120 rows
• Sample: Row 42: ' JOHN DOE ' → 'john doe'
[2] ✅ fill_missing
• Status: SUCCESS
• Rule: FILL_MISSING_VALUES
• Rows: 10,000 → 10,000
• Changed: 30 rows
• Sample: Row 101: 'age' = None → 34
======================================================================
📁 Export: provena export Customer_Onboarding_Pipeline.json
======================================================================
📁 Project Structure & API
audit_trail — Decorator for tracking transformations
ProvenaLogger — Core audit logger
audit_pipeline — Context manager for pipelines
generate_terminal_report() — Human-readable console report
logger.export_json("audit.json") — Persist audit logs
🤝 Contributing & Support
CleanCore is fully open source and welcomes contributions.
GitHub Repository & Issues
https://github.com/Sidra-009/cleancore-python-library
Found a bug or have an idea?
Open an issue on GitHub.
Want to contribute?
Fork the repo and submit a pull request.
📄 License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cleancore_python-0.1.0.tar.gz.
File metadata
- Download URL: cleancore_python-0.1.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53861d51a8b64be47d9c0458d57a6766a1421b64ef8f42b6aa4a9dde7cf24771
|
|
| MD5 |
705e14376d4b9fd292b8e94a5fedc802
|
|
| BLAKE2b-256 |
7946041750bf4b76d0abd5c9d434d42567c6b49de4e5726e36c3a671deb83526
|
File details
Details for the file cleancore_python-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cleancore_python-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
817ce7b8eebb5d7c8ed12a1ea09d360f0f5f814ff07a37b7815491ce939e59c7
|
|
| MD5 |
1bd8b1377bf71def62c49a189c9d5788
|
|
| BLAKE2b-256 |
b53a0e0dca669dca219415e706cb818498c246af134cc61cedff7db06984b85a
|