Skip to main content

Stochastic volatility models fit to historical time-series data.

Project description

svolfit

This is a package that I cobbled together that fits a selection of stochastic volatility models to historical data, mainly through the use of a trinomial tree to represent the variance process. Despite that (:0) it does a quite a good job of fitting parameters. (There is a document out there somewhere that has evidence of this, but it is still a wip and I will link/include it when appropriate.)

Model parameters are produced by brute force optimization of the log-likelihood calculated using the tree, using a standard minimizer from a python package. Once the parameters are known the most likely variance path for the latent (unobserved) variance is generated by working backwards through the tree (as a Viterbi algorithm). Note that you need to use at least a few years (4-5) of daily asset observations for the resulting parameters to be reasonably converged.

There are also algorithms (to come at a later date) that estimate correlations between the asset and latent volatility and other assets -- both with and without stochastic voaltility. The idea is that one then has a complete suite of tools to estimate parameters needed to (for example) include stochastic volatility models consistently within a derivative counterparty credit risk simulation model.

Usage -- svolfit

The idea is to keep things very simple so that one has access to model parameters quite easily:

(pars, sdict) = svolfit( series, dt, model='Heston', method = 'grid', ... )

where:

  • series: A numpy array holding the time series for the asset that you want to fit the model to, daily observations inctreasing in time from the start to the end of the array.
  • dt: The year fraction to assign to the time between two observations (dt=1/252).
  • model: The stochastic volatility model (more below).
  • method: The approach to fitting (more below).
  • pars: The estimated model parameters in a dictionary.
  • sdict: A dictionary containing a lot of other stuff, including the most likely variance path.

Note that when you run this there may well be some noise parduced about 'divide by zero in log', and some optimizer messages. These will be cleaned up once I figure out my strategy... Also, this call is much slower than it needs to be since it is running in a single process (the gradient calculation can easily be parallelized). TODO item.

The downside for 'simple' is that this approach does not work very well (i.e., not at all) for extreme parameters. In particular:

  1. Where the mean reversion timescale is of order grid spacing or smaller, or where it is longer than the observation window supplied for calibration. This is not rerally a limitation of the model but of the data, since no approach will be able to fit for the parameter accurately. Here we simply put bounds on the mean reversion parameter (hidden and undocumented for the moment) and it is up to the user to deal with cases where it is expected to be outside the range.
  2. Where correlations become large (larger than ~80% in magnitude) then correlation estimates have been observed to be biased towards zero. This appears to be due to grid effects, resulting from the fact that the time discretization of the 'tree' matches that of the historical asset observations. The 'treeX2' method doubles the frequency of the variance grid, with the result that the bias is materially reduced--at the expense of significantly increased computational time.
  3. Where the volatility of volatility is very large the variance grid becomes very coarse, and parameter estimates can become biased and noisy; the impact of this is also partly mitigated by use of the 'treeX2' method. Currently the model limits the volatility of volatility parameter for the model to not be 'excessively large' (currently hidden and undocumented). Fits to real financial time series data suggest that most time series are well handled by the tree approach, although not all -- this will need to be documented a bit more carefully at a later time.

models:

  • 'GBM': Geometric Brownian Motion.
  • 'LognormalJump': Lognormal jump process, no diffusion.
  • 'MertonJD': Merton Jump Diffusion model.
  • 'HestonNandi: Heston model with perfect correlation.
  • 'Heston': The Heston moodel.
  • 'Bates': The Heston model with lognormal jumps added to the asset process.
  • 'H32': The '3/2' model, H32='Heston 3/2 model'.
  • 'B32': The 3/2 model with lognormal jumps added, 'B32' = Bates 3/2 model'.
  • 'GARCHdiff': The GARCH diffusion model.
  • 'GARCHjdiff': GARCH diffusion with lognormal jumps.

Yes, this model naming convention sucks and will likely change at some point.

methods:

  • 'analytic': currently only available for GBM.
  • 'tree': A trinomial tree that explicitly fits the initial value of the variance.
  • 'treeX2': As above, but with the frequency of timesteps for the variance tree doubled. Beware that this is SLOW.
  • 'grid': Does not explicitly fit the initial value of the variance, instead inferring it from the estimate of the most likely variance path. Currently this is the fastest, and the most stable method.
  • 'v': only defined for HestonNandi, likely to vanish over time...

Current combinations (model,method) available with status:

  • (Heston,grid): Reliable.
  • (Heston,tree): Reliable, needs cleanup and optimization.
  • (Heston,treeX2): Reliable, needs cleanup and optimization.
  • (Bates,grid): Reliable, beware of noisy jump parameters when below observabiltiy or near degeneracy.
  • (Bates,tree): Reliable, needs cleanup and optimization.
  • (Bates,treeX2): Reliable, needs cleanup and optimization.
  • (H32,grid): Appears reliable; not as extensively tested as Heston.
  • (B32,grid): Appears reliable; not as extensively tested as Heston.
  • (GARCHdiff,grid): Appears reliable; not as extensively tested as Heston.
  • (GARCHjdiff,grid): Appears reliable; not as extensively tested as Heston.
  • (GBM,analytic): ML closed form, used for testing/benchmarking.
  • (LognormalJump,moments): Reliable.
  • (MertonJD,moments): Reliable.
  • (HestonNandi,v): Unreliable.

Usage -- estimationstats

Also included is a utility that simulates paths from a model, estimates parameters using the simulated paths, and provides some statistics:

estimationstats(NAME,Npaths,horizons,NumProcesses, dt, model, method, modeloptions )

where:

  • NAME: Just a string that will be tacked on to results to help identify them.
  • Npaths: Number of paths to be simulated, estimated and used to calculate fit statistics.
  • horizons: A list containing the estimation horizons that the paths will be fit to.
  • NumProcesses: The number of processes to use for estimation; negative/zero for a simgle thread, will use at most number_cpus()-1.
  • dt: The year fraction to assign to the time between two observations (dt=1/252).
  • model: One of the above models (may not be implemented for all).
  • method: One of the above methods (may not be implemented for all).
  • modeloptions: A dictionary with model options which must contain a dictionary of model parameters used to simulate paths, like: init={'mu': 0.05, 'sigma': 0.2} modeloptions={'init': init}

This can run for a long time depending on the model/method, so start with something like a GBM model, with short horizons to get a feel for it before committing.

The output is some csv files with stats (to be described later) as well as gnuplot scripts and latex to help produce a simple but readable pdf of results. So run: "gnuplot *.plt" "pdflatex .tex" All results appear in the resulting pdf file. It's not intended to be pretty, but should show whether the models do a reasonable job of fitting parameters or not.

Usage -- analysis_timeseries

This is a utility that estimates parameters from a provided timeseries:

analysis_timeseries(NAME,filename,assetname,obs_start,obs_finish,windows,stride,NumProcesses, dt, model, method, modeloptions )

where:

  • NAME: Just a string that will be tacked on to results to help identify them.
  • filename: The name of the file containing the timeseries.
  • assetname: The column name of the timeseries in the file (read using pandas).
  • obs_start: the entry number to start the analysis at (zero is the start).
  • obs_finish: The entry number to start the analysis at (-1 or a huge number uses all of the history).
  • windows: A list containing the estimation horizons that the paths will be fit to.
  • stride: the offset between estimation windows.
  • NumProcesses: The number of processes to use for estimation; negative/zero for a simgle thread, will use at most number_cpus()-1.
  • dt: The year fraction to assign to the time between two observations (dt=1/252).
  • model: One of the above models (may not be implemented for all).
  • method: One of the above methods (may not be implemented for all).
  • modeloptions: A dictionary with model options which must contain a dictionary of model parameters used to simulate paths, like: init={'mu': 0.05, 'sigma': 0.2} modeloptions={'init': init}

This can run for a long time depending on the model/method, so start with something like a GBM model, with short horizons to get a feel for it before committing.

The output is some csv files with stats (to be described later) as well as gnuplot scripts and latex to help produce a simple but readable pdf of results. So run: "gnuplot *.plt" "pdflatex .tex" All results appear in the resulting pdf file. It's not intended to be pretty, but should show whether the models do a reasonable job of fitting parameters or not.

Unit Tests

Unit tests are in the tests folder (currently these are slow ~1/2 hour):

pytest

No github repo (or equivalent) at the moment, please email me directly for comments/complaints/requests, etc.

mike.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svolfit-0.0.8.tar.gz (86.6 kB view hashes)

Uploaded Source

Built Distribution

svolfit-0.0.8-py3-none-any.whl (107.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page