Simulation of Correlated Asset Prices
Project description
SimCAP: Simulation of Correlated Asset Prices
SimCAP was created with the intent to make it simple to generate useful simulations of correlated multivariate financial time series. SimCAP is written in Python3 and leans heavily on the excellent hmmlearn package to train the Hidden Markov Models at the core of SimCAP simulations.
With SimCAP, just provide a pandas DataFrame of historical stock prices (or any other financial instrument) and you can easily generate thousands of simulations in minutes. Simulations resemble the original time series in both the correlations between assets and the statistical properties of the returns distributions.
Possible uses for SimCAP could be trading strategy development, portfolio optimization, portfolio risk management, financial planning, augementation of data sets for machine learning, etc.
Contents
Installation
SimCAP requires:
- Python >= 3.7
- NumPy >= 1.17
- SciPy >= 1.3
- scikit-learn >= 0.18
- statsmodels >= 0.13
- hmmlearn >= 0.2.7
- pandas >= 1.0
You can install SimCAP directly with pip:
pip install simcap
Alternatively, you can install SimCAP directly from source:
pip install git+https://github.com/jeremy-doyle/SimCAP.git
Quick Start
Daily closing prices (adjusted for dividends and splits) for the following Exchange Traded Funds (ETFs) over a 10-year period are included with the package:
Ticker | Description |
---|---|
SPY | SPDR S&P 500 ETF Trust |
VO | Vanguard Mid-Cap Index Fund ETF |
IWM | iShares Russell 2000 ETF |
EFA | iShares MSCI EAFE ETF |
VWO | Vanguard Emerging Markets Stock Index Fund ETF |
AGG | iShares Core US Aggregate Bond ETF |
IBND | SPDR Bloomberg International Corporate Bond ETF |
VNQ | Vanguard Real Estate Index Fund ETF |
Importing SimCAP and Example Data
>>> from simcap import SimCAP
>>> from simcap.datasets import load_example_asset_price_history
>>> obs = load_example_asset_price_history()
>>> obs.head().round(2)
SPY VO IWM EFA VWO AGG IBND VNQ
0 106.80 64.47 66.44 37.22 30.88 86.69 29.32 39.60
1 107.98 65.52 67.62 37.85 31.64 86.58 29.56 39.88
2 108.55 66.00 68.04 38.31 31.94 86.44 29.68 40.04
3 108.95 65.80 68.08 38.51 31.97 86.22 29.68 40.28
4 108.67 65.91 68.00 38.66 32.14 86.15 29.95 40.48
Generating Simulations
- Initialize a
SimCAP
instance with the observed data - Call the
generate_simulations
method
For this example, parameters were chosen to:
- generate five simulations
- set the beginning values of each asset in each simulation to 1 (
begin_values="norm"
) - output the simulations as a list of pandas dataframes
>>> model = SimCAP(obs)
>>> sims = model.generate_simulations(
>>> n_sims=5,
>>> begin_values="norm",
>>> output_as_dfs=True,
>>> )
markov model search: 100%|██████████| 60/60 [03:45<00:00, 3.77s/it]
generating simulations: 100%|██████████| 5/5 [00:02<00:00, 1.83it/s]
>>> sims[0].head().round(4)
SPY VO IWM EFA VWO AGG IBND VNQ
0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
1 1.0109 1.0112 1.0106 1.0118 1.0055 0.9974 0.9991 1.0085
2 1.0026 1.0029 1.0007 1.0127 1.0081 0.9989 1.0019 1.0109
3 1.0040 1.0037 1.0051 1.0171 1.0056 0.9955 0.9975 1.0193
4 0.9972 0.9952 0.9962 1.0080 1.0025 0.9980 0.9958 1.0135
Comparing Observation to Simulations
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> sns.set_theme(style="whitegrid", palette="bright")
>>> fig, axes = plt.subplots(
>>> ncols=2,
>>> nrows=3,
>>> sharex=True,
>>> sharey=True,
>>> figsize=(8, 12),
>>> )
>>> norm_obs = obs / obs.iloc[0, :]
>>> sns.lineplot(data=norm_obs, dashes=False, ax=axes[0][0])
>>> axes[0][0].set_title("Normalized Observation", fontsize=14)
>>> for i, ax in enumerate(axes.reshape(-1)[1:]):
>>> sns.lineplot(data=sims[i], dashes=False, ax=ax, legend=False)
>>> ax.set_title(f"Simulation {i+1}", fontsize=14)
>>> plt.tight_layout()
>>> plt.show();
>>> import numpy as np
>>> sim = sims[0]
>>> obs_log_return = norm_obs.pct_change().dropna().apply(np.log1p)
>>> sim_log_return = sim.pct_change().dropna().apply(np.log1p)
>>> obs_corr = obs_log_return.corr()
>>> sim_corr = sim_log_return.corr()
>>> corr_ax_labels = [["obs"],[ "sim"]]
>>> fig, axes = plt.subplot_mosaic(corr_ax_labels, figsize=(8, 6))
>>> sns.heatmap(obs_corr, annot=True, ax=axes["obs"], cbar=False)
>>> sns.heatmap(sim_corr, annot=True, ax=axes["sim"], cbar=False)
>>> axes["obs"].set_title("Observation", fontsize=14)
>>> axes["sim"].set_title("Simulation 1", fontsize=14)
>>> plt.tight_layout(rect=[0, 0, 1, 0.95])
>>> fig.suptitle("Correlation Coefficients of 1-Day Returns", fontsize=16)
>>> plt.show();
Additional Examples
For additional examples, please see the Jupyter notebooks located in the examples directory of this repository.
Stylized Facts of Financial Time Series
The mechanisms by which financial time series are generated are not fully understood. There are some statistical properties, however, that are common to financial time series. These properties are called stylized facts.
Stylized facts are observations that have been made in so many studies on the statistical properties of financial markets that they are widely understood to be empirical truths, to which theories and models must fit.
We'll walk through some of the most commonly documented stylized facts and see how a SimCAP simulation conforms.
Returns do not follow a normal distribution and are leptokurtic.
Asset returns are not normally distributed and are leptokurtic (fat tails). The degree of leptokurtosis tends to increase as the frequency of price measurement increases. In other words, there is more leptokurtosis in the distribution of 5 minute returns than there is in the distribution of weekly returns.
There is no consensus for the right distribution to model financial time series, though the following distributions are commonly used in practice:
- Normal
- Laplace
- Student's T
- Cauchy
- Lévy
None of these distributions, however, are great representations for every asset being modeled so SimCAP simulations aren't modeled on theortical distributions at all; rather, they are modeled using kernel density estimates for each of the modeled instruments' empirical distributions. Therefore, simulated distributions of returns closely resemble distributions seen in the "wild".
>>> import pandas as pd
>>> cols = sim.columns
>>> ax_labels = np.reshape(cols, (4, 2)).tolist()
>>> fig, axes = plt.subplot_mosaic(ax_labels, figsize=(8, 12))
>>> for col in cols:
>>> data = pd.DataFrame({"obs": obs_log_return[col], "sim": sim_log_return[col]})
>>> sns.kdeplot(data=data+1e-8, ax=axes[col], fill=True, alpha=0.25, log_scale=True)
>>> axes[col].set_title(col, fontsize=12)
>>> axes[col].set(xticklabels=[])
>>> sns.move_legend(axes[col], "upper left")
>>> plt.tight_layout(rect=[0, 0, 1, 0.95])
>>> fig.suptitle("Distribution of 1-Day Returns (x-axis log scale)", fontsize=16)
>>> plt.show();
The autocorrelation of returns is insignificant.
The autocorrelation (also known as serial correlation) of log returns in financial time series is insignificant. Considering the efficient markets hypothesis — the theory that asset prices reflect all information currently available thereby making future returns unpredictable — this lack of autocorrelation is not surprising.
Autocorrelation of returns is not present in SimCAP simulations.
>>> from statsmodels.graphics.tsaplots import plot_acf
>>> def acf_of_returns(squared=False):
>>> fig, axes = plt.subplot_mosaic(ax_labels, figsize=(8, 12))
>>> power = 2 if squared else 1
>>> for col in cols:
>>> data = pd.DataFrame({"obs": obs_log_return[col], "sim": sim_log_return[col]})
>>> plot_acf(x=data["obs"] ** power, ax=axes[col], lags=50, zero=False, label="obs")
>>> plot_acf(x=data["sim"] ** power, ax=axes[col], lags=50, zero=False, label="sim")
>>> axes[col].set_title(col, fontsize=12)
>>> axes[col].set_xlabel("Days Lag")
>>> handles, labels= axes[col].get_legend_handles_labels()
>>> handles = [handles[1], handles[3]]
>>> labels = [labels[1], labels[3]]
>>> axes[col].legend(handles=handles, labels=labels, loc="best", numpoints=1)
>>> plt.tight_layout(rect=[0, 0, 1, 0.95])
>>> title_mod = "squared " if squared else ""
>>> fig.suptitle(f"Autocorrelation of {title_mod.title()}Returns", fontsize=16)
>>> plt.savefig(f"../img/stylized-facts-autocorrelation{title_mod.strip()}.png")
>>> acf_of_returns()
Returns are subject to volatility clustering.
In contrast to the lack of serial dependence in returns, the autocorrelation of squared returns is always positive, significant, and decays slowly. This is explained by the phenomenon known as volatility clustering.
The volatility of returns changes over time but its magnitude — measured by squared returns — tends to persist or "cluster". In other words, there are periods where volatility is low — and stays low — while there are other periods where volatility is high — and stays high.
Volatility clustering is difficult to model. SimCAP simulations can exhibit volatility clustering though usually not to the degree observed in the market.
>>> acf_of_returns(squared=True)
Returns are negatively correlated with volatility.
Most measures of volatility of an asset are negatively correlated with the returns of that asset. Therefore, falling asset prices are typically accompanied by increased volatility and vice versa.
Here, like we did when examining the phenomenon of volatility clustering, we are measuring volatility by squaring returns. We can see that returns and volatility are negatively correlated in the SimCAP simulation at a similar degree to what we observed in the market.
>>> from sklearn.preprocessing import minmax_scale
>>> obs_coefs = list()
>>> sim_coefs = list()
>>> cols = obs.columns
>>> for col in cols:
>>> obs_coef = np.corrcoef(obs_log_return[col], obs_log_return[col]**2, rowvar=False)
>>> sim_coef = np.corrcoef(sim_log_return[col], sim_log_return[col]**2, rowvar=False)
>>> obs_coefs.append(obs_coef[0,1])
>>> sim_coefs.append(sim_coef[0,1])
>>> data = pd.DataFrame({"obs": obs_coefs, "sim": sim_coefs}, index=cols)
>>> fig, ax = plt.subplots(figsize=(8, 2))
>>> sns.heatmap(
>>> minmax_scale(data).T,
>>> annot=data.T,
>>> xticklabels=data.index,
>>> yticklabels=data.columns,
>>> ax=ax,
>>> linewidths=0.5,
>>> cbar=False,
>>> )
>>> plt.tight_layout(rect=[0, 0, 1, 0.925])
>>> fig.suptitle("Correlation Between Returns and Volatility", fontsize=16)
>>> plt.show();
API
simcap.SimCAP
class simcap.SimCAP(asset_price_history, prototypes=None, asset_proto_rel=None, external_factors=None, suppress_warnings=False)
Simulation of Correlated Asset Prices
Parameters
- asset_price_history : pandas DataFrame of shape (n_periods, n_assets)
- DataFrame of asset price history. Assets should be in columns with periods in rows. Ticker symbol or some other unique label for the asset should be in the first row. It is assumed the passed DataFrame is already sorted chronologically with the most recent price observation in the last row. Each of the assets in this DataFrame will be modeled by the simulation.
- prototypes : list of ndarrays or pandas Series of shape (n_periods, ), optional
- A list of univariate time series of shape (n_periods, ). It is assumed each of the prototypes are already sorted chronologically with the most recent observation in the last element of the array/Series. Prototypical time series can be used to model distributions of assets. For example, if attempting to model the price of a mutual fund with only two years of bull market price history, one might decide to use a stock index like the S&P 500 as a prototype for the mutual fund since the stock index has a longer history with examples across all market regimes.
- asset_proto_rel : dict, optional
- Required if prototypes are used. A dictionary mapping assets to prototypes. For example, if the asset with ticker of "SPD" maps to the first prototype, "GSUS" does not have a prototype, and "XVV" maps to the second prototype, pass the dictionary: {"SPD": 0, "GSUS": None, "XVV": 1}
- external_factors : pandas DataFrame of shape (n_periods, n_assets), optional
- DataFrame of factors that are used to influence the simulation of asset prices but are not simulated themselves. The DataFrame should have the same number of rows as the asset_price_history DataFrame. external_factor observations for a particular period should be in the same row as the asset_price_history observations for the same period. For example, if row 4 of the asset_price_history DataFrame are prices recorded on 2021-11-15, row 4 of the external_factors DataFrame should also be values that were recorded on 2021-11-15.
- suppress_warnings : bool, default=False
- If True, StationaryWarning and CorrelatedExogWarning warnings are suppressed.
Examples
>>> from simcap.datasets import load_example_asset_price_history
>>> from simcap import SimCAP
>>> asset_prices = load_example_asset_price_history()
>>> print(asset_prices.shape)
(2517, 8)
>>> model = SimCAP(asset_prices)
>>> sims = model.generate_simulations(
>>> n_sims=10,
>>> periods_per_sim=500,
>>> output_as_dfs=False,
>>> )
>>> print(sims.shape)
(10, 500, 8)
Methods
generate_simulations(n_sims=100, periods_per_sim=None, begin_values="norm", hmm_search_n_iter=60, hmm_search_n_fits_per_iter=10, hmm_search_params=None, output_as_dfs=True, output_as_float32=False, verbose=True)
Generate simulations using the data set at object initiation.
Parameters
- n_sims : int, default=100
- The number of simulations to generate.
- periods_per_sim : int, optional
- The number of periods, or rows, to generate in each simulation. If None (default), the periods per simulation will match the number of rows in the asset_price_history DataFrame.
- begin_values : {"start", "end", "norm"}, default="norm"
- The values to set for the first row of each simulation. "start" will set the first row of each simulation to match the first row of asset prices in the asset_price_history DataFrame. "end" will set the first row of each simulation to match the last row of the asset_price_history DataFrame. "norm" will set the first row of each simulation to 1 for all assets.
- hmm_search_n_iter: int, default=60
-
For Hidden Markov Model search, the number of parameter settings that are
sampled.
n_iter
trades off runtime vs quality of the solution. If exhausitive search of the parameter grid would result in fewer iterations, search stops when all parameter combinations have been searched. - hmm_search_n_fits_per_iter: int, default=10
- The number of Hidden Markov Models to be randomly initialized and evaluated for each combination of parameter settings.
- hmm_search_params: dict, default=None
-
For Hidden Markov Model search, dictionary with parameter names (str) as
keys and lists of parameters to try as dictionary values. Parameter lists
are sampled uniformly. All of the parameters are required:
* n_states: list of ints * cov_window_size: list of ints * pca_n_components: list of floats, ints, or None * scale_before_pca: list of bool * scale_after_pca: list of bool If fit_pipeline_params is None, default is: dict = ( n_states = [3, 4, 5, 6, 7, 8, 9], cov_window_size = [13, 21, 34, 55, 89, 144], pca_n_components = [0.8, 0.85, 0.9, 0.95, None], scale_before_pca = [True, False], scale_after_pca = [True, False], )
- output_as_dfs : bool, default=True
- If True, simulations are output as an n_sims element list of DataFrames of shape (periods_per_sim, n_assets). If False, simulations are ouput as an ndarray of shape (n_sims, periods_per_sim, n_assets).
- output_as_float32 : bool, default=False
- If True, convert simulation array to float32. If False, simulation will be output as float64.
- verbose : bool, default=True
- If True, progress bar written to stdout as simulations are generated.
Returns
- simulations : list (of length n_sims) of pandas DataFrames of shape (n_periods, n_variables) or ndarray of shape (n_simulations, n_periods, n_variables)
-
If
output_as_dfs
is True, simulations are output as an n_sims element list of DataFrames of shape (periods_per_sim, n_assets). If False, simulations are ouput as an ndarray of shape (n_sims, periods_per_sim, n_assets).
About the Developer
SimCAP is developed by me, Jeremy Doyle. I have a degree in finance and extensive securities industry experience. My never-ending pursuit for better analysis tools led me to data science and I fell in love with it. My passion therefore lies at the intersection of financial services and data science.
When optimizing the allocation of my own investment portfolio I wanted to stress-test allocations against multiple alternate realities — Monte Carlo simulation — but I struggled to find a package that could generate realistic simulations of correlated stock prices. Try doing a Google search for "simulating correlated stock prices" and you'll find a lot of talk about Geometric Brownian Motion (doesn't easily model multiple correlated assets) or simulations based on the Cholesky decomposition (can easily generate correlated simulations but the resulting return distributions are Gaussian and do not exhibit any of the volatility clustering we see in the "wild"). Since I couldn't find a suitable tool, I decided to build my own — the result was SimCAP.
If you have any questions, requests, or suggestions, please reach out to me at hello@jeremy-doyle.com. I hope you find SimCAP useful.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file simcap-0.1.1.tar.gz
.
File metadata
- Download URL: simcap-0.1.1.tar.gz
- Upload date:
- Size: 590.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2eeba486924d5ddc652e9ed90d7cdf5f6f0b7fb06554412029b169d3092cb7df |
|
MD5 | 6413f78e425e9f51082dddb6dd77c602 |
|
BLAKE2b-256 | bb2e09349906111f269b048b312e5ebddb7291325fe23891d6f9f72c59cfc3b8 |