Skip to main content

Karhunen Loève decomposed Gaussian processes with forward variable selection

Project description

FoKL

Karhunen Loève decomposed Gaussian processes with forward variable selection. Use this package for scalable GP regression and fast inference on static and dynamic datasets.

Setup

To install, use 'pip install FoKL' or clone this repo. Once installed, import into your environment with:

from FoKL import FoKLRoutines

If integrating, then include:

from FoKL import GP_Integrate

Now you are ready to begin creating your model, which can be initialized with:

model = FoKLRoutines.FoKL()

If intending to override the default hyperparameters, then you can include keywords in the model's initialization. For example:

model = FoKLRoutines.FoKL(btau=1000, draws=2000, way3=1)

Alternatively, hyperparameters can be redefined or updated with:

model.btau  = 1000
model.draws = 2000
model.way3  = 1

The above is useful for performing sweeps through hyperparameters without needing to initialize a new model (i.e., a new Python class) for each new combination of hyperparameters.

The default hyperparameters and their keywords are as follows:

phis       = getKernels.sp500()
relats_in  = []
a          = 4
b          = f(a, data)
atau       = 4
btau       = f(atau, data)
tolerance  = 3
draws      = 1000
gimmie     = False
way3       = False
threshav   = 0.05
threshstda = 0.5
threshstdb = 2
aic        = False

A description of each hyperparameter is listed in the function documentation.

Training

Call the 'fit' function to train the FoKL model on all of 'data'.

betas, mtx, evs = model.fit(inputs, data)

Or, define the keyword 'train' as the percentage of 'data' to use for training.

betas, mtx, evs = model.fit(inputs, data, train=0.8)

The console will display the index and bic of the model being built in real time. Once completed, the model can be validated with the 'coverage3' function:

meens, bounds, rmse = model.coverage3()

By default, 'coverage3' predicts output values for 'model.inputs', which is just the normalized and properly formatted 'inputs' provided in 'fit'. If validating visually, then a sorted plot of the test set (for 'train' < 1) tends to be most insightful:

model.coverage3(inputs=model.testinputs, data=model.testdata, plot='sorted', bounds=1, legend=1)

Note 'data' must correspond to the set used for 'inputs' to calculate the model's RMSE, which is the third positional output of 'coverage3'.

As a side note, the following attributes were added to your FoKL class 'model' after calling 'fit' which may be useful during user post-processing:

model.inputs         == all normalized inputs w/o outliers (i.e., model.traininputs plus model.testinputs)
model.data           == all data w/o outliers (i.e., model.traindata plus model.testdata)

model.betas          == betas
model.mtx            == mtx
model.evs            == evs

model.rawinputs      == all normalized inputs w/ outliers == user's 'inputs' but normalized and formatted
model.rawdata        == all data w/ outliers              == user's 'data' but formatted
model.traininputs    == train set of model.inputs
model.traindata      == train set of model.data
model.testinputs     == test set of model.inputs
model.testdata       == test set of model.data
model.normalize      == [min, max] factors used to normalize user's 'inputs' to 0-1 scale of model.rawinputs
model.outliers       == indices removed from model.rawinputs and model.rawdata as outliers
model.trainlog       == indices of model.inputs used for model.traininputs
model.testlog        == indices of model.data used for model.traindata

model.inputs_np      == model.inputs as a numpy array of timestamps x input variables
model.rawinputs_np   == model.rawinputs as a numpy array of timestamps x input variables
model.traininputs_np == model.traininputs as a numpy array of timestamps x input variables
model.testinputs_np  == model.testinputs as a numpy array of timestamps x input variables

To remove all of the above attributes so that only the hyperparameters remain, most importantly so that 'betas' does not influence the training of a new model, use:

model.clear()

Integration

FoKL can be used to model state derivatives and thus contains an integration method of these states using an RK4. Due to each state being modeled independently, the same functionality cannot be used. For the case of two states, 'State1' and 'State2', with the same inputs:

model = FoKLRoutines.FoKL()

dStates = [dState1, dState2]
betas = []
mtx = []
for i in range(2):
    betas_i, mtx_i, _ = model.fit(inputs, dStates[i])
    betas.append(betas_i)
    mtx.append(mtx_i)
    model.clear()

After fitting the above state derivatives, call the 'GP_Integrate' function to integrate:

T, Y = GP_Integrate([np.mean(betas[0],axis=0),np.mean(betas[1],axis=0)], [mtx[0],mtx[1]], utest, norms, phis, start, stop, ic, stepsize, used_inputs)

Alternatively, multiple separate FoKL classes can be created to achieve the same result:

model1 = FoKLRoutines.FoKL()
model2 = FoKLRoutines.FoKL()

betas1, mtx1, _ = model1.fit(inputs, dState1)
betas2, mtx2, _ = model2.fit(inputs, dState2)

T, Y = GP_Integrate([np.mean(betas1,axis=0),np.mean(betas2,axis=0)], [mtx1,mtx2], utest, norms, phis, start, stop, ic, stepsize, used_inputs)

See 'GP_intergrate_example.py' for an example.

Development

More sophisticated outlier removal methods are currently in development, but for demonstration purposes the following will search through 'data' and remove any points with a z-score greater than 4:

model.fit(model.inputs, model.data, CatchOutliers='Data', OutliersMethod='Z-Score', OutliersMethodParams=4)

Also in development are additional methods for splitting 'data' into test/train sets, beyond the current method which is limited to a random split.

Citations

Please cite: K. Hayes, M.W. Fouts, A. Baheri and D.S. Mebane, "Forward variable selection enables fast and accurate dynamic system identification with Karhunen-Loève decomposed Gaussian processes", arXiv:2205.13676

Credits: David Mebane (ideas and original code), Kyle Hayes (integrator), Derek Slack (Python porting)

Funding provided by National Science Foundation, Award No. 2119688

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FoKL-3.0.0.tar.gz (4.1 MB view hashes)

Uploaded Source

Built Distribution

FoKL-3.0.0-py3-none-any.whl (225.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page