Skip to main content

Synthetic Finance Data Auditor & Optimizer

Project description

SFDAO - Synthetic Finance Data Auditor & Optimizer

Financial Compliance & Synthetic Data Quality Assurance Platform

PyPI version Python Version License Tests Codecov Code Style

日本語版 (Japanese)

Overview

SFDAO is an integrated tool for synthetic data generation, constraint application, and auditing specifically designed for the financial industry. Covering Phases 1 to 3, it handles not only auditing but also generation, guardrail checking, scenario injection, and ML Utility evaluation.

Key Features

  • Statistical Quality Evaluation: Distribution comparison using KS test and Jensen-Shannon Divergence.
  • Finance-Specific Evaluation: Fat Tail detection, Volatility Clustering verification.
  • Privacy Evaluation: Re-identification risk, Distance to Closest Record (DCR).
  • Automated Type Detection: Automatic classification of Numeric, Categorical, Datetime, and PII (Personally Identifiable Information).
  • Generation Workflow: Batch execution of synthetic data generation and auditing via generate/run.
  • Constraints & Scenarios: Guardrail rule application, scenario injection (scale/shift/clip/outlier, etc.).
  • ML Utility Evaluation: Model performance assessment using TSTR (AUC/F1) (optional).
  • Report Generation: Detailed reports in HTML/PDF formats.

Installation

Quick Install (PyPI)

# Install via pip
pip install sfdao

# Or use pipx for isolated installation (recommended)
pipx install sfdao

# With optional deep learning support (CTGAN)
pip install sfdao[deep]

Prerequisites

  • Python 3.10 - 3.12
  • macOS: WeasyPrint dependencies for PDF generation
    brew install cairo pango gdk-pixbuf libffi
    

Development Setup

# Clone the repository
git clone https://github.com/takurot/sfdao.git
cd sfdao

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Enable virtual environment
poetry shell

Quick Start

# Run a basic audit
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.html

# Output format is automatically detected by extension (.txt/.html/.pdf)
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.txt
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.pdf

# Generate simple synthetic data for testing
poetry run python -m sfdao.scripts.generate_test_synthetic_data \
  example/data/creditcard_real_sample.csv \
  example/output/creditcard_synthetic.csv \
  --n-samples 500 \
  --random-state 42

# Audit the generated synthetic data
poetry run sfdao audit \
  --real example/data/creditcard_real_sample.csv \
  --synthetic example/output/creditcard_synthetic.csv \
  --output example/output/report.html

# Phase 2: Batch execution of Generation -> Guardrails -> Audit
poetry run sfdao run --config example/config/phase2.yaml --outdir example/output

Development

TDD (Test-Driven Development)

This project is developed using TDD. Follow this cycle when adding new features:

  1. Red: Write a failing test.
  2. Green: Write the minimum code to pass the test.
  3. Refactor: Clean up and optimize the code.

Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov=sfdao --cov-report=html

# Run specific test file
pytest tests/unit/ingestion/test_loader.py

Code Quality

# Check formatting
black --check .

# Apply formatting
black .

# Lint check
flake8 .

# Type check
mypy sfdao

# Security check
bandit -r sfdao

Project Structure

sfdao/
├── sfdao/                  # Main package
│   ├── ingestion/          # Data ingestion and type detection
│   ├── config/             # Configuration schema/loader
│   ├── generator/          # Synthetic data generation
│   ├── guard/              # Rule-based constraint checking
│   ├── scenario/           # Scenario injection
│   ├── evaluator/          # Metric calculation
│   ├── reporter/           # Report generation
│   └── cli/                # CLI interface
├── tests/                  # Test code
│   ├── unit/               # Unit tests
│   ├── integration/        # Integration tests
│   └── e2e/                # End-to-End tests
├── docs/                   # Documentation
└── prompt/                 # Specifications

Documentation

Roadmap

Phase 1: "The Auditor" (MVP)

  • Project structure and CI/CD setup
  • Basic Data Ingestion features
  • Auto-Type Detection
  • Finance Domain definitions
  • Basic Evaluator (statistical tests)
  • Financial Stylized Facts evaluation
  • Privacy evaluation
  • Evaluation scoring integration
  • CLI interface
  • Report generation feature
  • Integration testing and documentation

Phase 2: "The Generator & Logic"

  • Config schema/loader and CLI integration (generate/run)
  • Baseline Generator (statistical sampling)
  • Constraint & Logic Guard (rule detection/exclusion/correction)
  • Scenario Injection (scale/shift/clip/outlier, etc.)
  • E2E workflow (generate -> guard -> audit)
  • Benchmark and Privacy sampling

Phase 3: "The Professional"

  • CI/CD optimization and Release workflow
  • Advanced Generator (CTGAN, optional)
  • ML Utility evaluation (TSTR: AUC/F1)
  • PyPI metadata/CHANGELOG/README maintenance

Future Ideas

  • Rule Learning Engine (Reinforcement Learning based)
  • Auto-Tuning Mode (Autonomous quality improvement)

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork this repository.
  2. Create a feature branch (git checkout -b feature/amazing-feature).
  3. Write tests before implementing (TDD).
  4. Commit your changes (git commit -m 'Add amazing feature').
  5. Push to the branch (git push origin feature/amazing-feature).
  6. Create a Pull Request.

License

MIT License - see the LICENSE file for details.

Contact

For questions or suggestions regarding the project, please create an Issue.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfdao-0.1.1.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sfdao-0.1.1-py3-none-any.whl (50.5 kB view details)

Uploaded Python 3

File details

Details for the file sfdao-0.1.1.tar.gz.

File metadata

  • Download URL: sfdao-0.1.1.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sfdao-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c90fdf9a1152d951750491d9170850b9d25996019a2061481fa30b660d2a92d
MD5 0e882023a65866c98842db375e75f696
BLAKE2b-256 e991f64360b050c3ff364abf0bb32d498b0c482b2fe0480fdd16ca4547895467

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfdao-0.1.1.tar.gz:

Publisher: release.yml on takurot/sfdao

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sfdao-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sfdao-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 50.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sfdao-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3fef32b5f7b904bafa40a87ba766c38233275803ed6f15c515c462e15bb6d1ff
MD5 e8852892334389e4a1b0fb83e885ec71
BLAKE2b-256 ed3aa74ddb95bd3606421b3bf48a8ffeb641fdcf5d570affc42b91bd50904b76

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfdao-0.1.1-py3-none-any.whl:

Publisher: release.yml on takurot/sfdao

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page