Synthetic Finance Data Auditor & Optimizer
Project description
SFDAO - Synthetic Finance Data Auditor & Optimizer
Financial Compliance & Synthetic Data Quality Assurance Platform
Overview
SFDAO is an integrated tool for synthetic data generation, constraint application, and auditing specifically designed for the financial industry. Covering Phases 1 to 3, it handles not only auditing but also generation, guardrail checking, scenario injection, and ML Utility evaluation.
Key Features
- Statistical Quality Evaluation: Distribution comparison using KS test and Jensen-Shannon Divergence.
- Finance-Specific Evaluation: Fat Tail detection, Volatility Clustering verification.
- Privacy Evaluation: Re-identification risk, Distance to Closest Record (DCR).
- Automated Type Detection: Automatic classification of Numeric, Categorical, Datetime, and PII (Personally Identifiable Information).
- Generation Workflow: Batch execution of synthetic data generation and auditing via
generate/run. - Constraints & Scenarios: Guardrail rule application, scenario injection (scale/shift/clip/outlier, etc.).
- ML Utility Evaluation: Model performance assessment using TSTR (AUC/F1) (optional).
- Report Generation: Detailed reports in HTML/PDF formats.
Installation
Quick Install (PyPI)
# Install via pip
pip install sfdao
# Or use pipx for isolated installation (recommended)
pipx install sfdao
# With optional deep learning support (CTGAN)
pip install sfdao[deep]
Prerequisites
- Python 3.10 - 3.12
- macOS: WeasyPrint dependencies for PDF generation
brew install cairo pango gdk-pixbuf libffi
Development Setup
# Clone the repository
git clone https://github.com/takurot/sfdao.git
cd sfdao
# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Enable virtual environment
poetry shell
Quick Start
# Run a basic audit
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.html
# Output format is automatically detected by extension (.txt/.html/.pdf)
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.txt
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.pdf
# Generate simple synthetic data for testing
poetry run python -m sfdao.scripts.generate_test_synthetic_data \
example/data/creditcard_real_sample.csv \
example/output/creditcard_synthetic.csv \
--n-samples 500 \
--random-state 42
# Audit the generated synthetic data
poetry run sfdao audit \
--real example/data/creditcard_real_sample.csv \
--synthetic example/output/creditcard_synthetic.csv \
--output example/output/report.html
# Phase 2: Batch execution of Generation -> Guardrails -> Audit
poetry run sfdao run --config example/config/phase2.yaml --outdir example/output
Development
TDD (Test-Driven Development)
This project is developed using TDD. Follow this cycle when adding new features:
- Red: Write a failing test.
- Green: Write the minimum code to pass the test.
- Refactor: Clean up and optimize the code.
Testing
# Run all tests
pytest
# Run with coverage report
pytest --cov=sfdao --cov-report=html
# Run specific test file
pytest tests/unit/ingestion/test_loader.py
Code Quality
# Check formatting
black --check .
# Apply formatting
black .
# Lint check
flake8 .
# Type check
mypy sfdao
# Security check
bandit -r sfdao
Project Structure
sfdao/
├── sfdao/ # Main package
│ ├── ingestion/ # Data ingestion and type detection
│ ├── config/ # Configuration schema/loader
│ ├── generator/ # Synthetic data generation
│ ├── guard/ # Rule-based constraint checking
│ ├── scenario/ # Scenario injection
│ ├── evaluator/ # Metric calculation
│ ├── reporter/ # Report generation
│ └── cli/ # CLI interface
├── tests/ # Test code
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-End tests
├── docs/ # Documentation
└── prompt/ # Specifications
Documentation
- Implementation Plan
- Product Specifications
- Example
- Usage Guide
- Architecture
- Python API
- Metrics Guide
Roadmap
Phase 1: "The Auditor" (MVP)
- Project structure and CI/CD setup
- Basic Data Ingestion features
- Auto-Type Detection
- Finance Domain definitions
- Basic Evaluator (statistical tests)
- Financial Stylized Facts evaluation
- Privacy evaluation
- Evaluation scoring integration
- CLI interface
- Report generation feature
- Integration testing and documentation
Phase 2: "The Generator & Logic"
- Config schema/loader and CLI integration (
generate/run) - Baseline Generator (statistical sampling)
- Constraint & Logic Guard (rule detection/exclusion/correction)
- Scenario Injection (scale/shift/clip/outlier, etc.)
- E2E workflow (generate -> guard -> audit)
- Benchmark and Privacy sampling
Phase 3: "The Professional"
- CI/CD optimization and Release workflow
- Advanced Generator (CTGAN, optional)
- ML Utility evaluation (TSTR: AUC/F1)
- PyPI metadata/CHANGELOG/README maintenance
Future Ideas
- Rule Learning Engine (Reinforcement Learning based)
- Auto-Tuning Mode (Autonomous quality improvement)
Contributing
Contributions are welcome! Please follow these steps:
- Fork this repository.
- Create a feature branch (
git checkout -b feature/amazing-feature). - Write tests before implementing (TDD).
- Commit your changes (
git commit -m 'Add amazing feature'). - Push to the branch (
git push origin feature/amazing-feature). - Create a Pull Request.
License
MIT License - see the LICENSE file for details.
Contact
For questions or suggestions regarding the project, please create an Issue.
Acknowledgments
- SDV (Synthetic Data Vault)
- CTGAN
- Kaggle Credit Card Fraud Detection Dataset
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sfdao-0.1.1.tar.gz.
File metadata
- Download URL: sfdao-0.1.1.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c90fdf9a1152d951750491d9170850b9d25996019a2061481fa30b660d2a92d
|
|
| MD5 |
0e882023a65866c98842db375e75f696
|
|
| BLAKE2b-256 |
e991f64360b050c3ff364abf0bb32d498b0c482b2fe0480fdd16ca4547895467
|
Provenance
The following attestation bundles were made for sfdao-0.1.1.tar.gz:
Publisher:
release.yml on takurot/sfdao
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfdao-0.1.1.tar.gz -
Subject digest:
9c90fdf9a1152d951750491d9170850b9d25996019a2061481fa30b660d2a92d - Sigstore transparency entry: 1131364390
- Sigstore integration time:
-
Permalink:
takurot/sfdao@7f8e0e12efb21198078168ac4ff57f8cffd400a3 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/takurot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7f8e0e12efb21198078168ac4ff57f8cffd400a3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sfdao-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sfdao-0.1.1-py3-none-any.whl
- Upload date:
- Size: 50.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fef32b5f7b904bafa40a87ba766c38233275803ed6f15c515c462e15bb6d1ff
|
|
| MD5 |
e8852892334389e4a1b0fb83e885ec71
|
|
| BLAKE2b-256 |
ed3aa74ddb95bd3606421b3bf48a8ffeb641fdcf5d570affc42b91bd50904b76
|
Provenance
The following attestation bundles were made for sfdao-0.1.1-py3-none-any.whl:
Publisher:
release.yml on takurot/sfdao
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfdao-0.1.1-py3-none-any.whl -
Subject digest:
3fef32b5f7b904bafa40a87ba766c38233275803ed6f15c515c462e15bb6d1ff - Sigstore transparency entry: 1131364418
- Sigstore integration time:
-
Permalink:
takurot/sfdao@7f8e0e12efb21198078168ac4ff57f8cffd400a3 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/takurot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7f8e0e12efb21198078168ac4ff57f8cffd400a3 -
Trigger Event:
push
-
Statement type: