Skip to main content

Synthetic financial transaction data generation with persona-driven behavior simulation.

Project description

FinForge v3.0.0

FinForge is a Python library for generating realistic synthetic financial transaction datasets with persistent personas, temporal balance consistency, business cashflow simulation, and reproducible fraud and anomaly scenarios.

FinForge v3.0.0 — Fraud, Anomaly & Risk Simulation

FinForge v3 adds a post-generation risk layer on top of the normal v1/v2 behavioral engine:

  • fraud injection engine
  • anomaly simulation engine
  • rule-based risk scoring
  • fraud scenario IDs
  • persona-aware fraud patterns
  • fraud summary utilities
  • lightweight fraud feature extraction for ML workflows

Fraud is injected after normal behavior generation, so suspicious activity appears as a deviation from a realistic baseline rather than replacing normal spending.

Core capabilities

  • Student, salaried, freelancer, business owner, household, retired, and mixed persona simulation
  • Persistent behavioral identity metadata
  • Irregular income and business cashflow
  • Business vs personal account flags
  • Seasonal business income and quarterly tax payments
  • Recurring bills and subscriptions
  • Balance tracking and overdraft metadata
  • Session-based spending and low-balance suppression
  • Fraud, anomaly, and risk metadata
  • Seed reproducibility
  • CSV export and pandas-native workflows

Installation

pip install finforge

For local development:

pip install -e .[dev]

Quickstart

Baseline dataset:

from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=101)
    .with_users(3)
    .with_persona("student")
    .for_months(2)
    .generate()
)

Mixed population:

from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=42)
    .with_users(50)
    .with_persona("mixed")
    .for_months(12)
    .generate()
)

Fraud dataset:

from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=42)
    .with_users(100)
    .with_persona("mixed")
    .for_months(6)
    .with_fraud(rate=0.03)
    .generate()
)

Fraud + anomaly + risk scoring:

from finforge import DatasetGenerator

df = (
    DatasetGenerator(seed=42)
    .with_users(500)
    .with_persona("mixed")
    .for_months(12)
    .with_fraud(rate=0.03)
    .with_anomalies(rate=0.05)
    .with_risk_scoring()
    .generate()
)

Personas

Supported personas:

  • student
  • salaried
  • freelancer
  • business_owner
  • household
  • retired
  • mixed

Mixed mode supports all v2 personas. When user_count is at least the number of supported personas, FinForge guarantees at least one user per persona. Remaining users are assigned using deterministic weighted distribution, so the same seed and config produce the same persona mix.

Fraud simulation

Supported fraud types:

  • card_fraud
  • account_takeover
  • mule_account
  • refund_abuse
  • business_invoice_fraud

Examples:

df = (
    DatasetGenerator(seed=42)
    .with_users(500)
    .with_persona("mixed")
    .for_months(12)
    .with_fraud(
        rate=0.03,
        types=[
            "card_fraud",
            "account_takeover",
            "mule_account",
            "refund_abuse",
            "business_invoice_fraud",
        ],
        severity="medium",
    )
    .generate()
)

Persona-aware behavior includes:

  • Student: smaller late-night wallet drain, gaming, gift-card, and account-takeover patterns
  • Salaried: salary-account drain, electronics fraud, and high-value transfer abuse
  • Freelancer: suspicious payouts, platform-style anomalies, and fake vendor/service expenses
  • Business owner: invoice abuse, fake supplier payments, round-number vendor anomalies
  • Household: unusual shopping, insurance, or family-account payment anomalies
  • Retired: phishing-style transfers and healthcare scam deviations

Anomaly simulation

Anomalies are suspicious but not confirmed fraud.

Supported anomaly types:

  • unusual_amount
  • unusual_time
  • unusual_merchant
  • unusual_category
  • velocity_spike
  • balance_drain
  • income_spike
df = (
    DatasetGenerator(seed=42)
    .with_users(100)
    .with_persona("mixed")
    .for_months(6)
    .with_anomalies(rate=0.05)
    .generate()
)

Risk scoring

FinForge includes deterministic rule-based transaction risk scoring.

df = (
    DatasetGenerator(seed=42)
    .with_users(100)
    .with_persona("mixed")
    .for_months(6)
    .with_fraud(rate=0.03)
    .with_risk_scoring()
    .generate()
)

Risk output includes:

  • risk_score from 0.0 to 1.0
  • risk_level in low, medium, high, critical
  • risk_reasons such as:
    • amount_spike
    • odd_hour
    • new_merchant
    • new_category
    • velocity_spike
    • balance_drain
    • rapid_in_out_transfer
    • refund_pattern
    • suspicious_vendor
    • business_invoice_anomaly
    • healthcare_scam_pattern

Fraud/anomaly metadata

v3 adds the following columns:

  • is_fraud
  • fraud_type
  • fraud_scenario_id
  • fraud_stage
  • fraud_severity
  • fraud_pattern
  • fraud_start_time
  • risk_score
  • risk_level
  • risk_reasons
  • is_anomaly
  • anomaly_type
  • anomaly_score

These columns always exist, even when fraud and anomalies are disabled.

Summary utilities

from finforge import DatasetGenerator
from finforge.analysis import fraud_summary

df = (
    DatasetGenerator(seed=42)
    .with_users(500)
    .with_persona("mixed")
    .for_months(12)
    .with_fraud(rate=0.03)
    .with_anomalies(rate=0.05)
    .with_risk_scoring()
    .generate()
)

print(fraud_summary(df))

The summary utility reports:

  • total transactions
  • fraud transactions and fraud rate
  • fraud by type
  • fraud by persona
  • anomaly count and anomaly rate
  • risk level distribution
  • average risk score by fraud/non-fraud
  • top risk reasons

ML-ready feature extraction

from finforge.features import build_fraud_features

X, y = build_fraud_features(df)

The helper returns:

  • X: pandas DataFrame
  • y: pandas Series

Feature columns include amount, hour, balance, recurring/discretionary flags, business/tax flags, anomaly and risk scores, and encoded categorical fields such as persona, category, account type, and transaction type.

Architecture

Core simulation:

  • finforge.core
  • finforge.personas
  • finforge.generators
  • finforge.merchants
  • finforge.behavior
  • finforge.dataset

v3 extensions:

  • finforge.fraud
  • finforge.anomaly
  • finforge.risk
  • finforge.analysis
  • finforge.features

Fraud and anomaly injection happen after the baseline transaction dataset is generated. Balances are recomputed after injection so chronological integrity is preserved.

Examples

See the examples in examples:

Testing guarantees

The test suite covers:

  • v1/v2 backward compatibility
  • fraud injection and scenario grouping
  • anomaly generation
  • risk score bounds and relative ordering
  • balance integrity after fraud injection
  • chronological ordering after fraud injection
  • simulation timestamp range safety
  • mixed persona guarantees
  • seed reproducibility
  • feature helper outputs

Run tests with:

pytest

Why FinForge is different

FinForge focuses on persistent financial behavior over time:

  • behavioral continuity instead of isolated fake rows
  • temporal balance realism
  • persona-aware cashflow and business behavior
  • configurable fraud deviations on top of realistic normal activity
  • deterministic reproducibility for QA, analytics, and ML experimentation

Changelog

See CHANGELOG.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finforge-3.0.0.tar.gz (57.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finforge-3.0.0-py3-none-any.whl (69.7 kB view details)

Uploaded Python 3

File details

Details for the file finforge-3.0.0.tar.gz.

File metadata

  • Download URL: finforge-3.0.0.tar.gz
  • Upload date:
  • Size: 57.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for finforge-3.0.0.tar.gz
Algorithm Hash digest
SHA256 81fe0abe25a7c577664fb7d05ae1cba0f18ad8d6756033285863c48f253862ee
MD5 846b114b339510254e46829ad34dd432
BLAKE2b-256 d0879d53571472a564cb902cdc2c8cd12f269dfb1a2ca5832cd95c25dcc111c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for finforge-3.0.0.tar.gz:

Publisher: publish.yml on shivangis22/finforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file finforge-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: finforge-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 69.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for finforge-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 477f4bd247d55646875754636c498f8cb8104cd953e0306af8aaedaf1c07b2f6
MD5 4f1d0803707a480a804329fac03406f0
BLAKE2b-256 2107e4d93ec39946a5f88b1de14364ed2f07d1510fa22c409d8b4aba0431baa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for finforge-3.0.0-py3-none-any.whl:

Publisher: publish.yml on shivangis22/finforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page