Synthetic financial transaction data generation with persona-driven behavior simulation.
Project description
FinForge v3.0.0
FinForge is a Python library for generating realistic synthetic financial transaction datasets with persistent personas, temporal balance consistency, business cashflow simulation, and reproducible fraud and anomaly scenarios.
FinForge v3.0.0 — Fraud, Anomaly & Risk Simulation
FinForge v3 adds a post-generation risk layer on top of the normal v1/v2 behavioral engine:
- fraud injection engine
- anomaly simulation engine
- rule-based risk scoring
- fraud scenario IDs
- persona-aware fraud patterns
- fraud summary utilities
- lightweight fraud feature extraction for ML workflows
Fraud is injected after normal behavior generation, so suspicious activity appears as a deviation from a realistic baseline rather than replacing normal spending.
Core capabilities
- Student, salaried, freelancer, business owner, household, retired, and mixed persona simulation
- Persistent behavioral identity metadata
- Irregular income and business cashflow
- Business vs personal account flags
- Seasonal business income and quarterly tax payments
- Recurring bills and subscriptions
- Balance tracking and overdraft metadata
- Session-based spending and low-balance suppression
- Fraud, anomaly, and risk metadata
- Seed reproducibility
- CSV export and pandas-native workflows
Installation
pip install finforge
For local development:
pip install -e .[dev]
Quickstart
Baseline dataset:
from finforge import DatasetGenerator
df = (
DatasetGenerator(seed=101)
.with_users(3)
.with_persona("student")
.for_months(2)
.generate()
)
Mixed population:
from finforge import DatasetGenerator
df = (
DatasetGenerator(seed=42)
.with_users(50)
.with_persona("mixed")
.for_months(12)
.generate()
)
Fraud dataset:
from finforge import DatasetGenerator
df = (
DatasetGenerator(seed=42)
.with_users(100)
.with_persona("mixed")
.for_months(6)
.with_fraud(rate=0.03)
.generate()
)
Fraud + anomaly + risk scoring:
from finforge import DatasetGenerator
df = (
DatasetGenerator(seed=42)
.with_users(500)
.with_persona("mixed")
.for_months(12)
.with_fraud(rate=0.03)
.with_anomalies(rate=0.05)
.with_risk_scoring()
.generate()
)
Personas
Supported personas:
studentsalariedfreelancerbusiness_ownerhouseholdretiredmixed
Mixed mode supports all v2 personas. When user_count is at least the number of supported personas, FinForge guarantees at least one user per persona. Remaining users are assigned using deterministic weighted distribution, so the same seed and config produce the same persona mix.
Fraud simulation
Supported fraud types:
card_fraudaccount_takeovermule_accountrefund_abusebusiness_invoice_fraud
Examples:
df = (
DatasetGenerator(seed=42)
.with_users(500)
.with_persona("mixed")
.for_months(12)
.with_fraud(
rate=0.03,
types=[
"card_fraud",
"account_takeover",
"mule_account",
"refund_abuse",
"business_invoice_fraud",
],
severity="medium",
)
.generate()
)
Persona-aware behavior includes:
- Student: smaller late-night wallet drain, gaming, gift-card, and account-takeover patterns
- Salaried: salary-account drain, electronics fraud, and high-value transfer abuse
- Freelancer: suspicious payouts, platform-style anomalies, and fake vendor/service expenses
- Business owner: invoice abuse, fake supplier payments, round-number vendor anomalies
- Household: unusual shopping, insurance, or family-account payment anomalies
- Retired: phishing-style transfers and healthcare scam deviations
Anomaly simulation
Anomalies are suspicious but not confirmed fraud.
Supported anomaly types:
unusual_amountunusual_timeunusual_merchantunusual_categoryvelocity_spikebalance_drainincome_spike
df = (
DatasetGenerator(seed=42)
.with_users(100)
.with_persona("mixed")
.for_months(6)
.with_anomalies(rate=0.05)
.generate()
)
Risk scoring
FinForge includes deterministic rule-based transaction risk scoring.
df = (
DatasetGenerator(seed=42)
.with_users(100)
.with_persona("mixed")
.for_months(6)
.with_fraud(rate=0.03)
.with_risk_scoring()
.generate()
)
Risk output includes:
risk_scorefrom0.0to1.0risk_levelinlow,medium,high,criticalrisk_reasonssuch as:amount_spikeodd_hournew_merchantnew_categoryvelocity_spikebalance_drainrapid_in_out_transferrefund_patternsuspicious_vendorbusiness_invoice_anomalyhealthcare_scam_pattern
Fraud/anomaly metadata
v3 adds the following columns:
is_fraudfraud_typefraud_scenario_idfraud_stagefraud_severityfraud_patternfraud_start_timerisk_scorerisk_levelrisk_reasonsis_anomalyanomaly_typeanomaly_score
These columns always exist, even when fraud and anomalies are disabled.
Summary utilities
from finforge import DatasetGenerator
from finforge.analysis import fraud_summary
df = (
DatasetGenerator(seed=42)
.with_users(500)
.with_persona("mixed")
.for_months(12)
.with_fraud(rate=0.03)
.with_anomalies(rate=0.05)
.with_risk_scoring()
.generate()
)
print(fraud_summary(df))
The summary utility reports:
- total transactions
- fraud transactions and fraud rate
- fraud by type
- fraud by persona
- anomaly count and anomaly rate
- risk level distribution
- average risk score by fraud/non-fraud
- top risk reasons
ML-ready feature extraction
from finforge.features import build_fraud_features
X, y = build_fraud_features(df)
The helper returns:
X: pandasDataFramey: pandasSeries
Feature columns include amount, hour, balance, recurring/discretionary flags, business/tax flags, anomaly and risk scores, and encoded categorical fields such as persona, category, account type, and transaction type.
Architecture
Core simulation:
finforge.corefinforge.personasfinforge.generatorsfinforge.merchantsfinforge.behaviorfinforge.dataset
v3 extensions:
finforge.fraudfinforge.anomalyfinforge.riskfinforge.analysisfinforge.features
Fraud and anomaly injection happen after the baseline transaction dataset is generated. Balances are recomputed after injection so chronological integrity is preserved.
Examples
See the examples in examples:
- fraud_card_fraud.py
- fraud_account_takeover.py
- fraud_mule_account.py
- fraud_business_invoice.py
- anomaly_generation.py
- fraud_dataset_for_ml.py
- fraud_summary_demo.py
- persona_comparison_v2.py
Testing guarantees
The test suite covers:
- v1/v2 backward compatibility
- fraud injection and scenario grouping
- anomaly generation
- risk score bounds and relative ordering
- balance integrity after fraud injection
- chronological ordering after fraud injection
- simulation timestamp range safety
- mixed persona guarantees
- seed reproducibility
- feature helper outputs
Run tests with:
pytest
Why FinForge is different
FinForge focuses on persistent financial behavior over time:
- behavioral continuity instead of isolated fake rows
- temporal balance realism
- persona-aware cashflow and business behavior
- configurable fraud deviations on top of realistic normal activity
- deterministic reproducibility for QA, analytics, and ML experimentation
Changelog
See CHANGELOG.md.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finforge-3.0.0.tar.gz.
File metadata
- Download URL: finforge-3.0.0.tar.gz
- Upload date:
- Size: 57.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81fe0abe25a7c577664fb7d05ae1cba0f18ad8d6756033285863c48f253862ee
|
|
| MD5 |
846b114b339510254e46829ad34dd432
|
|
| BLAKE2b-256 |
d0879d53571472a564cb902cdc2c8cd12f269dfb1a2ca5832cd95c25dcc111c0
|
Provenance
The following attestation bundles were made for finforge-3.0.0.tar.gz:
Publisher:
publish.yml on shivangis22/finforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
finforge-3.0.0.tar.gz -
Subject digest:
81fe0abe25a7c577664fb7d05ae1cba0f18ad8d6756033285863c48f253862ee - Sigstore transparency entry: 1801674387
- Sigstore integration time:
-
Permalink:
shivangis22/finforge@6053340e1637a86e83520cf52f840077f2e41bf0 -
Branch / Tag:
refs/tags/v3.0.0 - Owner: https://github.com/shivangis22
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6053340e1637a86e83520cf52f840077f2e41bf0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file finforge-3.0.0-py3-none-any.whl.
File metadata
- Download URL: finforge-3.0.0-py3-none-any.whl
- Upload date:
- Size: 69.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
477f4bd247d55646875754636c498f8cb8104cd953e0306af8aaedaf1c07b2f6
|
|
| MD5 |
4f1d0803707a480a804329fac03406f0
|
|
| BLAKE2b-256 |
2107e4d93ec39946a5f88b1de14364ed2f07d1510fa22c409d8b4aba0431baa1
|
Provenance
The following attestation bundles were made for finforge-3.0.0-py3-none-any.whl:
Publisher:
publish.yml on shivangis22/finforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
finforge-3.0.0-py3-none-any.whl -
Subject digest:
477f4bd247d55646875754636c498f8cb8104cd953e0306af8aaedaf1c07b2f6 - Sigstore transparency entry: 1801674509
- Sigstore integration time:
-
Permalink:
shivangis22/finforge@6053340e1637a86e83520cf52f840077f2e41bf0 -
Branch / Tag:
refs/tags/v3.0.0 - Owner: https://github.com/shivangis22
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6053340e1637a86e83520cf52f840077f2e41bf0 -
Trigger Event:
release
-
Statement type: