Skip to main content

A high-performance synthetic financial data generator that uses Heston Stochastic Volatility and Jump Diffusion models.

Project description

fsynth: High-Fidelity Synthetic Financial Data Generator

PyPI version License: MIT Python 3.9+ Build Status

fsynth is a high-performance Python library for generating realistic, multi-asset financial time series and corresponding fundamental reports. Unlike simple geometric brownian motion (GBM) generators, fsynth models the complex statistical properties of real markets—including volatility clustering, fat tails, and regime-dependent correlations—using Heston Stochastic Volatility and Merton Jump Diffusion processes.

Designed for quantitative researchers, AI/ML engineers, and financial educators who need massive, clean, and statistically rigorous datasets for backtesting and model training.


🚀 Features

  • Stochastic Volatility: Implements the Heston model to simulate time-varying volatility, capturing the "volatility smile" and clustering observed in real markets.
  • Regime Switching: Simulates macro-economic states (Bull, Bear, Crisis) that dynamically alter correlation matrices and volatility baselines.
  • Jump Diffusion: Incorporates Poisson-distributed price jumps to model market shocks (Merton model).
  • Linked Fundamentals: Generates coherent 10-Q/10-K style fundamental data (Revenue, EBITDA, EPS, Debt) that correlates with the stock's price performance and sector genes.
  • High Performance: Core simulation kernels are JIT-compiled using numba for C-level speeds, allowing for the generation of millions of rows in seconds.
  • Parquet Native: Outputs optimized Parquet files ready for ingestion into Pandas, Polars, or PySpark.

📦 Installation

pip install fsynth

Or build from source:

git clone [https://github.com/welcra/fsynth.git](https://github.com/welcra/fsynth.git)
cd fsynth
pip install -e .

⚡ Quick Start

1. The Command Line Interface (CLI)

The easiest way to generate a dataset is using the bundled CLI tool. This command generates 500 stocks over 10 years and saves the data to the data/ folder.

fsynth-gen --stocks 500 --years 10 --out data

Output:

  • data/market_index.parquet: The macro-economic backbone (regimes, risk-free rates).
  • data/stock_prices.parquet: OHLCV data for all 500 tickers (~1.2M rows).
  • data/fundamentals.parquet: Quarterly financial reports for all tickers.

2. Python API

For integration into your own scripts or data pipelines:

from fsynth import MarketConfig, MarketSimulator, FundamentalGenerator
import pandas as pd

# 1. Configure the Simulation
config = MarketConfig(
    T=5,                # Years
    dt=1/252,           # Daily time steps
    n_stocks=100,       # Number of tickers
    n_sectors=5,        # Distinct sectors with unique correlations
    seed=42             # Reproducibility
)

# 2. Run the Engine
sim = MarketSimulator(config)
print("Generating Market Backbone...")
market_df = sim.generate_market()

print("Generating Asset Paths...")
stock_dfs = sim.generate_stocks()

# 3. Aggregate Data
all_prices = pd.concat(stock_dfs.values(), ignore_index=True)
metadata = pd.DataFrame([
    {'Ticker': k, 'Sector': v['Sector'].iloc[0]} 
    for k, v in stock_dfs.items()
])

# 4. Generate Fundamentals
print("Generating 10-Q Reports...")
fund_gen = FundamentalGenerator(market_df, metadata, seed=config.seed)
fundamentals_df = fund_gen.generate_reports()

print(f"Generated {len(all_prices)} price rows and {len(fundamentals_df)} reports.")

🧮 Mathematical Methodology

fsynth moves beyond standard Random Walk theories to capture the nuanced risks of real financial markets.

The Heston Model

We model the spot price $S_t$ and its variance $v_t$ using the following system of stochastic differential equations (SDEs):

$$dS_t = \mu S_t dt + \sqrt{v_t} S_t dW_t^S$$

$$dv_t = \kappa(\theta - v_t)dt + \xi \sqrt{v_t} dW_t^v$$

Where:

  • $\theta$ is the long-run average variance.
  • $\kappa$ is the rate of mean reversion.
  • $\xi$ is the volatility of volatility (vol-of-vol).
  • $dW_t^S$ and $dW_t^v$ are Brownian motions with correlation $\rho$.

Regime Switching & Jump Diffusion

To model market crashes and shocks (fat tails), we introduce a Poisson jump process $J$:

$$ \frac{dS_t}{S_t} = (\mu - \lambda k)dt + \sigma dW_t + dJ_t $$

A Hidden Markov Model (HMM) governs the transition between Bull, Bear, and Crisis regimes, automatically adjusting parameters $\mu$, $\sigma$, and jump intensity $\lambda$ in real-time.


📂 Data Structure

Stock Prices (OHLCV)

Date Ticker Open High Low Close Volume Regime
2023-01-01 STK_001 100.0 101.2 99.5 100.8 150240 Bull

Fundamentals (Quarterly)

Date Ticker Revenue EBITDA EPS Debt RegimeEnv
2023-03-31 STK_001 450.20 112.50 2.10 300.00 0.12

🤝 Contributing

Contributions are welcome! Please open an issue to discuss proposed changes or submit a Pull Request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

📄 License

Distributed under the MIT License. See LICENSE for more information.


Built by Arnav Malhotra.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fsynth-0.1.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fsynth-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file fsynth-0.1.0.tar.gz.

File metadata

  • Download URL: fsynth-0.1.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for fsynth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bcc559b0bb61863199c7c799a8f964aab6c0535da4e97afe6a8f7704d2d9ca56
MD5 c0e1706be4b3bf0d04aa3dc12ee2af53
BLAKE2b-256 898d4e01d54ac62f824401afdf13d3a424d3d1d0706947c48ee035736186a011

See more details on using hashes here.

File details

Details for the file fsynth-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fsynth-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for fsynth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec1759f6951f4a4eaba8c90843a0d269873504a9d6834f90891a434ecfa76fe6
MD5 d0f9d5e893907e696e9354f2d3ed591f
BLAKE2b-256 f30bc214464749756ec2b529094f7484ff5c3b63f4cdbabd42150825c6946d60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page