A high-performance synthetic financial data generator that uses Heston Stochastic Volatility and Jump Diffusion models.
Project description
fsynth: High-Fidelity Synthetic Financial Data Generator
fsynth is a high-performance Python library for generating realistic, multi-asset financial time series and corresponding fundamental reports. Unlike simple geometric brownian motion (GBM) generators, fsynth models the complex statistical properties of real markets—including volatility clustering, fat tails, and regime-dependent correlations—using Heston Stochastic Volatility and Merton Jump Diffusion processes.
Designed for quantitative researchers, AI/ML engineers, and financial educators who need massive, clean, and statistically rigorous datasets for backtesting and model training.
🚀 Features
- Stochastic Volatility: Implements the Heston model to simulate time-varying volatility, capturing the "volatility smile" and clustering observed in real markets.
- Regime Switching: Simulates macro-economic states (Bull, Bear, Crisis) that dynamically alter correlation matrices and volatility baselines.
- Jump Diffusion: Incorporates Poisson-distributed price jumps to model market shocks (Merton model).
- Linked Fundamentals: Generates coherent 10-Q/10-K style fundamental data (Revenue, EBITDA, EPS, Debt) that correlates with the stock's price performance and sector genes.
- High Performance: Core simulation kernels are JIT-compiled using
numbafor C-level speeds, allowing for the generation of millions of rows in seconds. - Parquet Native: Outputs optimized Parquet files ready for ingestion into Pandas, Polars, or PySpark.
📦 Installation
pip install fsynth
Or build from source:
git clone [https://github.com/welcra/fsynth.git](https://github.com/welcra/fsynth.git)
cd fsynth
pip install -e .
⚡ Quick Start
1. The Command Line Interface (CLI)
The easiest way to generate a dataset is using the bundled CLI tool. This command generates 500 stocks over 10 years and saves the data to the data/ folder.
fsynth-gen --stocks 500 --years 10 --out data
Output:
data/market_index.parquet: The macro-economic backbone (regimes, risk-free rates).data/stock_prices.parquet: OHLCV data for all 500 tickers (~1.2M rows).data/fundamentals.parquet: Quarterly financial reports for all tickers.
2. Python API
For integration into your own scripts or data pipelines:
from fsynth import MarketConfig, MarketSimulator, FundamentalGenerator
import pandas as pd
# 1. Configure the Simulation
config = MarketConfig(
T=5, # Years
dt=1/252, # Daily time steps
n_stocks=100, # Number of tickers
n_sectors=5, # Distinct sectors with unique correlations
seed=42 # Reproducibility
)
# 2. Run the Engine
sim = MarketSimulator(config)
print("Generating Market Backbone...")
market_df = sim.generate_market()
print("Generating Asset Paths...")
stock_dfs = sim.generate_stocks()
# 3. Aggregate Data
all_prices = pd.concat(stock_dfs.values(), ignore_index=True)
metadata = pd.DataFrame([
{'Ticker': k, 'Sector': v['Sector'].iloc[0]}
for k, v in stock_dfs.items()
])
# 4. Generate Fundamentals
print("Generating 10-Q Reports...")
fund_gen = FundamentalGenerator(market_df, metadata, seed=config.seed)
fundamentals_df = fund_gen.generate_reports()
print(f"Generated {len(all_prices)} price rows and {len(fundamentals_df)} reports.")
🧮 Mathematical Methodology
fsynth moves beyond standard Random Walk theories to capture the nuanced risks of real financial markets.
The Heston Model
We model the spot price $S_t$ and its variance $v_t$ using the following system of stochastic differential equations (SDEs):
$$dS_t = \mu S_t dt + \sqrt{v_t} S_t dW_t^S$$
$$dv_t = \kappa(\theta - v_t)dt + \xi \sqrt{v_t} dW_t^v$$
Where:
- $\theta$ is the long-run average variance.
- $\kappa$ is the rate of mean reversion.
- $\xi$ is the volatility of volatility (vol-of-vol).
- $dW_t^S$ and $dW_t^v$ are Brownian motions with correlation $\rho$.
Regime Switching & Jump Diffusion
To model market crashes and shocks (fat tails), we introduce a Poisson jump process $J$:
$$ \frac{dS_t}{S_t} = (\mu - \lambda k)dt + \sigma dW_t + dJ_t $$
A Hidden Markov Model (HMM) governs the transition between Bull, Bear, and Crisis regimes, automatically adjusting parameters $\mu$, $\sigma$, and jump intensity $\lambda$ in real-time.
📂 Data Structure
Stock Prices (OHLCV)
| Date | Ticker | Open | High | Low | Close | Volume | Regime |
|---|---|---|---|---|---|---|---|
| 2023-01-01 | STK_001 | 100.0 | 101.2 | 99.5 | 100.8 | 150240 | Bull |
Fundamentals (Quarterly)
| Date | Ticker | Revenue | EBITDA | EPS | Debt | RegimeEnv |
|---|---|---|---|---|---|---|
| 2023-03-31 | STK_001 | 450.20 | 112.50 | 2.10 | 300.00 | 0.12 |
🤝 Contributing
Contributions are welcome! Please open an issue to discuss proposed changes or submit a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
📄 License
Distributed under the MIT License. See LICENSE for more information.
Built by Arnav Malhotra.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fsynth-0.1.1.tar.gz.
File metadata
- Download URL: fsynth-0.1.1.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
602df6577ed28a3a7467fdf3c3b92b92098995d0a0fef0286eac7073e900e76c
|
|
| MD5 |
069314ad22f2e6e9438882711e98cea5
|
|
| BLAKE2b-256 |
3291e3d2a1a86ac340c7cd87abb71b18ff9bdc2134e0769120b9775ab6823f92
|
File details
Details for the file fsynth-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fsynth-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf6b06f4439b47665c06e411566285f250fdf669e0f38411ca34bf7d7089a6ba
|
|
| MD5 |
ada95e7357a06491a432ba003d327bc9
|
|
| BLAKE2b-256 |
d4358977dfb9b0ff407a49bba9b3bd892a138cd21d44a3939f6313b2845c36b5
|