Synthetic Finance Data Auditor & Optimizer

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

takurot

These details have not been verified by PyPI

Project description

SFDAO - Synthetic Finance Data Auditor & Optimizer

Financial Compliance & Synthetic Data Quality Assurance Platform

日本語版 (Japanese)

Overview

SFDAO is an integrated tool for synthetic data generation, constraint application, and auditing specifically designed for the financial industry. Covering Phases 1 to 3, it handles not only auditing but also generation, guardrail checking, scenario injection, and ML Utility evaluation.

Key Features

Statistical Quality Evaluation: Distribution comparison using KS test and Jensen-Shannon Divergence.
Finance-Specific Evaluation: Fat Tail detection, Volatility Clustering verification.
Privacy Evaluation: Re-identification risk, Distance to Closest Record (DCR).
Automated Type Detection: Automatic classification of Numeric, Categorical, Datetime, and PII (Personally Identifiable Information).
Generation Workflow: Batch execution of synthetic data generation and auditing via generate/run.
Constraints & Scenarios: Guardrail rule application, scenario injection (scale/shift/clip/outlier, etc.).
ML Utility Evaluation: Model performance assessment using TSTR (AUC/F1) (optional).
Report Generation: Detailed reports in HTML/PDF formats.

Installation

Quick Install (PyPI)

# Install via pip
pip install sfdao

# Or use pipx for isolated installation (recommended)
pipx install sfdao

# With optional deep learning support (CTGAN)
pip install sfdao[deep]

Prerequisites

Python 3.10 - 3.12

macOS: WeasyPrint dependencies for PDF generation

brew install cairo pango gdk-pixbuf libffi

Development Setup

# Clone the repository
git clone https://github.com/takurot/sfdao.git
cd sfdao

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Enable virtual environment
poetry shell

Quick Start

# Run a basic audit
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.html

# Output format is automatically detected by extension (.txt/.html/.pdf)
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.txt
sfdao audit --real data/real.csv --synthetic data/synthetic.csv --output report.pdf

# Generate simple synthetic data for testing
poetry run python -m sfdao.scripts.generate_test_synthetic_data \
  example/data/creditcard_real_sample.csv \
  example/output/creditcard_synthetic.csv \
  --n-samples 500 \
  --random-state 42

# Audit the generated synthetic data
poetry run sfdao audit \
  --real example/data/creditcard_real_sample.csv \
  --synthetic example/output/creditcard_synthetic.csv \
  --output example/output/report.html

# Phase 2: Batch execution of Generation -> Guardrails -> Audit
poetry run sfdao run --config example/config/phase2.yaml --outdir example/output

Development

TDD (Test-Driven Development)

This project is developed using TDD. Follow this cycle when adding new features:

Red: Write a failing test.
Green: Write the minimum code to pass the test.
Refactor: Clean up and optimize the code.

Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov=sfdao --cov-report=html

# Run specific test file
pytest tests/unit/ingestion/test_loader.py

Code Quality

# Check formatting
black --check .

# Apply formatting
black .

# Lint check
flake8 .

# Type check
mypy sfdao

# Security check
bandit -r sfdao

Project Structure

sfdao/
├── sfdao/                  # Main package
│   ├── ingestion/          # Data ingestion and type detection
│   ├── config/             # Configuration schema/loader
│   ├── generator/          # Synthetic data generation
│   ├── guard/              # Rule-based constraint checking
│   ├── scenario/           # Scenario injection
│   ├── evaluator/          # Metric calculation
│   ├── reporter/           # Report generation
│   └── cli/                # CLI interface
├── tests/                  # Test code
│   ├── unit/               # Unit tests
│   ├── integration/        # Integration tests
│   └── e2e/                # End-to-End tests
├── docs/                   # Documentation
└── prompt/                 # Specifications

Documentation

Roadmap

Phase 1: "The Auditor" (MVP)

Project structure and CI/CD setup
Basic Data Ingestion features
Auto-Type Detection
Finance Domain definitions
Basic Evaluator (statistical tests)
Financial Stylized Facts evaluation
Privacy evaluation
Evaluation scoring integration
CLI interface
Report generation feature
Integration testing and documentation

Phase 2: "The Generator & Logic"

Config schema/loader and CLI integration (generate/run)
Baseline Generator (statistical sampling)
Constraint & Logic Guard (rule detection/exclusion/correction)
Scenario Injection (scale/shift/clip/outlier, etc.)
E2E workflow (generate -> guard -> audit)
Benchmark and Privacy sampling

Phase 3: "The Professional"

CI/CD optimization and Release workflow
Advanced Generator (CTGAN, optional)
ML Utility evaluation (TSTR: AUC/F1)
PyPI metadata/CHANGELOG/README maintenance

Future Ideas

Rule Learning Engine (Reinforcement Learning based)
Auto-Tuning Mode (Autonomous quality improvement)

Contributing

Contributions are welcome! Please follow these steps:

Fork this repository.
Create a feature branch (git checkout -b feature/amazing-feature).
Write tests before implementing (TDD).
Commit your changes (git commit -m 'Add amazing feature').
Push to the branch (git push origin feature/amazing-feature).
Create a Pull Request.

License

MIT License - see the LICENSE file for details.

Contact

For questions or suggestions regarding the project, please create an Issue.

Acknowledgments

SDV (Synthetic Data Vault)
CTGAN
Kaggle Credit Card Fraud Detection Dataset

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

takurot

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Mar 19, 2026

0.1.0

Jan 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfdao-0.1.1.tar.gz (37.8 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sfdao-0.1.1-py3-none-any.whl (50.5 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file sfdao-0.1.1.tar.gz.

File metadata

Download URL: sfdao-0.1.1.tar.gz
Upload date: Mar 19, 2026
Size: 37.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sfdao-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9c90fdf9a1152d951750491d9170850b9d25996019a2061481fa30b660d2a92d`
MD5	`0e882023a65866c98842db375e75f696`
BLAKE2b-256	`e991f64360b050c3ff364abf0bb32d498b0c482b2fe0480fdd16ca4547895467`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfdao-0.1.1.tar.gz:

Publisher: release.yml on takurot/sfdao

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sfdao-0.1.1.tar.gz
- Subject digest: 9c90fdf9a1152d951750491d9170850b9d25996019a2061481fa30b660d2a92d
- Sigstore transparency entry: 1131364390
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: takurot/sfdao@7f8e0e12efb21198078168ac4ff57f8cffd400a3
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/takurot
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7f8e0e12efb21198078168ac4ff57f8cffd400a3
- Trigger Event: push

File details

Details for the file sfdao-0.1.1-py3-none-any.whl.

File metadata

Download URL: sfdao-0.1.1-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 50.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sfdao-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3fef32b5f7b904bafa40a87ba766c38233275803ed6f15c515c462e15bb6d1ff`
MD5	`e8852892334389e4a1b0fb83e885ec71`
BLAKE2b-256	`ed3aa74ddb95bd3606421b3bf48a8ffeb641fdcf5d570affc42b91bd50904b76`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfdao-0.1.1-py3-none-any.whl:

Publisher: release.yml on takurot/sfdao

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sfdao-0.1.1-py3-none-any.whl
- Subject digest: 3fef32b5f7b904bafa40a87ba766c38233275803ed6f15c515c462e15bb6d1ff
- Sigstore transparency entry: 1131364418
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: takurot/sfdao@7f8e0e12efb21198078168ac4ff57f8cffd400a3
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/takurot
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7f8e0e12efb21198078168ac4ff57f8cffd400a3
- Trigger Event: push

sfdao 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SFDAO - Synthetic Finance Data Auditor & Optimizer

Overview

Key Features

Installation

Quick Install (PyPI)

Prerequisites

Development Setup

Quick Start

Development

TDD (Test-Driven Development)

Testing

Code Quality

Project Structure

Documentation

Roadmap

Phase 1: "The Auditor" (MVP)

Phase 2: "The Generator & Logic"

Phase 3: "The Professional"

Future Ideas

Contributing

License

Contact

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance