No project description provided

## Project description

# Introduction

*[to be added]*

# Quickstart

*If you haven't already, install the package via pip install anatomy, preferably in a new Anaconda environment with Python 3.9.*

The anatomy package uses a simple workflow. An `Anatomy`

object is initially estimated on your forecasting setup (using your data and your models), is then stored to disk, and can then be loaded at any future time without requiring re-estimation.

After initial estimation, an `Anatomy`

can anatomize:

- forecasts produced by any combination of your models
- your original forecasts
- any loss or gain function applied to your forecasts
- an arbitrary subset of your forecasts

all of which requires *no additional computational time*.

### General structure

You may already have trained your models before you create the `Anatomy`

, and the aggregate of all your models at all periods may be too large to fit into your RAM. During estimation, the `Anatomy`

will therefore ask you for the specific model and dataset it needs at a given iteration by calling your mapping function:

```
from anatomy import *
def my_map(key: AnatomyModelProvider.PeriodKey) -> \
AnatomyModelProvider.PeriodValue:
train, test, model = ... # load from somewhere or generate here
return AnatomyModelProvider.PeriodValue(train, test, model)
```

You wrap the mapping function in an `AnatomyModelProvider`

alongside information about the forecasting application:

```
my_provider = AnatomyModelProvider(
n_periods=..., n_features=..., model_names=[...],
y_name=..., provider_fn=my_map
)
```

and finally create the `Anatomy`

:

```
my_anatomy = Anatomy(provider=my_provider, n_iterations=...).precompute(
n_jobs=16, save_path="my_anatomy.bin"
)
```

After running the above, the `Anatomy`

is estimated and stored in your working directory as `my_anatomy.bin`

.

## Example:

To get started, we need a forecasting application. We use a linear DGP to generate our dataset consisting of 500 observations of the three predictors `x_{0,1,2}`

and our target `y`

:

```
# set random seed for reproducibility:
np.random.seed(1338)
xy = pd.DataFrame(np.random.normal(0, 1, (500, 3)), columns=["x_0", "x_1", "x_2"])
xy["y"] = xy.sum(axis=1) + np.random.normal(0, 1, 500)
# set a unique and monotonically increasing index (default index would suffice):
xy.index = pd.date_range("2021-04-19", "2022-08-31").map(lambda x: x.date())
```

For convenience, the `AnatomySubsets`

includes a generator that splits your dataset into training and test sets according to your forecasting scheme. Here, we include 100 periods in our first training set, forecast the target of the next period with no gap between the training set and the forecast, extend our training set by one period, and repeat until we reach the end of our data:

```
subsets = AnatomySubsets.generate(
index=xy.index,
initial_window=100,
estimation_type=AnatomySubsets.EstimationType.EXPANDING,
periods=1,
gap=0
)
```

In this example, we have not yet trained our models. We thus do so directly in our mapping function:

```
def mapper(key: AnatomyModelProvider.PeriodKey) -> \
AnatomyModelProvider.PeriodValue:
train = xy.iloc[subsets.get_train_subset(key.period)]
test = xy.iloc[subsets.get_test_subset(key.period)]
if key.model_name == "ols":
model = train_ols(train.drop("y", axis=1), train["y"])
elif key.model_name == "rf":
model = train_rf(train.drop("y", axis=1), train["y"])
return AnatomyModelProvider.PeriodValue(train, test, model)
```

using `train_ols`

and `train_rf`

, which train a model and yield its prediction function wrapped in an `AnatomyModel`

:

```
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
def train_ols(x_train: pd.DataFrame, y_train: pd.Series) -> AnatomyModel:
ols_model = LinearRegression().fit(x_train, y_train)
def pred_fn_ols(xs: np.ndarray) -> np.ndarray:
xs_df = pd.DataFrame(xs, columns=x_train.columns)
return np.array(ols_model.predict(xs_df)).flatten()
return AnatomyModel(pred_fn_ols)
def train_rf(x_train: pd.DataFrame, y_train: pd.Series) -> AnatomyModel:
rf_model = RandomForestRegressor(random_state=1338).fit(x_train, y_train)
def pred_fn_rf(xs: np.ndarray) -> np.ndarray:
xs_df = pd.DataFrame(xs, columns=x_train.columns)
return np.array(rf_model.predict(xs_df)).flatten()
return AnatomyModel(pred_fn_rf)
```

We now have all we need to train the models and estimate the `Anatomy`

:

```
provider = AnatomyModelProvider(
n_periods=subsets.n_periods,
n_features=xy.shape[1]-1,
model_names=["ols", "rf"],
y_name="y",
provider_fn=mapper
)
anatomy = Anatomy(provider=provider, n_iterations=10).precompute(
n_jobs=16, save_path="anatomy.bin"
)
```

At this point, the `Anatomy`

is stored as `anatomy.bin`

in our working directory. We can load it at any later point using `anatomy = Anatomy.load("anatomy.bin")`

.

## Anatomizing

We can now use our estimated `Anatomy`

to dismantle our forecasts. In this example, we are interested in the two models, `rf`

and `ols`

, as well as the equal-weighted combination of the two:

```
groups = {
"rf": ["rf"],
"ols": ["ols"],
"ols+rf": ["ols", "rf"]
}
```

### Anatomize the out-of-sample RÂ² of the forecasts:

To decompose the out-of-sample RÂ² of our forecasts produced by the two models and the combination, we compute the unconditional forecasts and provide a function transforming forecasts into out-of-sample RÂ² to the `Anatomy`

:

```
prevailing_mean = np.array([
xy.iloc[subsets.get_train_subset(period=i)]["y"].mean()
for i in range(subsets.n_periods)
])
def transform(y_hat, y):
return 1 - np.sum((y - y_hat) ** 2) / np.sum((y - prevailing_mean) ** 2)
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

This yields the change in out-of-sample RÂ² attributable to each predictor:

```
>>> df
base_contribution x_0 x_1 x_2
rf 2021-07-28 -> 2022-08-31 0.000759 0.257240 0.238458 0.215062
ols 2021-07-28 -> 2022-08-31 0.000000 0.285329 0.267940 0.249870
ols+rf 2021-07-28 -> 2022-08-31 0.000383 0.279073 0.259201 0.240548
```

### ... the RMSE of the forecasts:

```
def transform(y_hat, y):
return np.sqrt(np.mean((y - y_hat) ** 2))
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

which yields the change in root mean squared error attributable to each predictor:

```
>>> df
base_contribution x_0 x_1 x_2
rf 2021-07-28 -> 2022-08-31 2.105513 -0.351430 -0.326065 -0.296708
ols 2021-07-28 -> 2022-08-31 2.106313 -0.414488 -0.386125 -0.371149
ols+rf 2021-07-28 -> 2022-08-31 2.105910 -0.397903 -0.368193 -0.350080
```

### ... the MAE:

```
def transform(y_hat, y):
return np.mean(np.abs(y - y_hat))
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

which yields the change in mean absolute error attributable to each predictor:

```
>>> df
base_contribution x_0 x_1 x_2
rf 2021-07-28 -> 2022-08-31 1.679359 -0.299382 -0.249591 -0.221651
ols 2021-07-28 -> 2022-08-31 1.679946 -0.345583 -0.300613 -0.288960
ols+rf 2021-07-28 -> 2022-08-31 1.679652 -0.330303 -0.283129 -0.262270
```

### ... the SE:

*Note that the transform function in this case does not aggregate and returns a vector instead of a scalar. The Anatomy thus yields one decomposition per forecast, which is also known as a local (as opposed to global) decomposition.*

```
def transform(y_hat, y):
return (y - y_hat) ** 2
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

which yields the change in squared error attributable to each predictor for each forecast:

```
>>> df
base_contribution x_0 x_1 x_2
rf 2021-07-28 0.026485 -0.063311 -0.115985 0.163743
2021-07-29 0.021582 0.238569 -0.370448 0.694773
2021-07-30 2.451742 -2.702365 1.660915 -1.407192
... ... ... ... ...
```

### ... or just the raw forecasts:

```
def transform(y_hat):
return y_hat
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

which yields the change in the forecast attributable to each predictor:

```
>>> df
base_contribution x_0 x_1 x_2
rf 2021-07-28 0.070861 0.222932 0.360182 -0.315820
2021-07-29 0.070354 0.250389 -0.617779 0.984992
2021-07-30 0.071163 -1.514547 0.839562 -0.835136
... ... ... ... ...
```

### The Efficiency property

*During estimation the Anatomy checks that the individual attributions of the predictors to the forecasts sum up exactly to the forecasts produced by the models. The estimation would be aborted if efficiency does not hold.*

Due to the efficiency property of Shapley values, summing the individual attributions yields the decomposed value(s) exactly. We can check that the `Anatomy`

is consistet, for instance, by computing the RMSE from the decomposed raw forecasts, and by comparing that to the RMSE that we can compute from the decomposed RMSE.

Let's begin by decomposing the RMSE directly:

```
def transform(y_hat, y):
return np.sqrt(np.mean((y - y_hat) ** 2))
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

The RMSE is the sum of the individual attributions, `rmse_a = df.sum(axis=1)`

:

```
>>> rmse_a
rf 2021-07-28 -> 2022-08-31 1.131310
ols 2021-07-28 -> 2022-08-31 0.934551
ols+rf 2021-07-28 -> 2022-08-31 0.989733
```

Let's next decompose the raw forecasts and compute the RMSE for these:

```
def transform(y_hat):
return y_hat
df = anatomy.explain(
model_sets=AnatomyModelCombination(groups=groups),
transformer=AnatomyModelOutputTransformer(transform=transform)
)
```

The forecasts are recovered as the sum of the individual attributions, `y_hat = df.sum(axis=1)`

:

```
>>> y_hat
rf 2021-07-28 0.338154
2021-07-29 0.687956
2021-07-30 -1.438958
...
```

From the forecasts, we can compute the RMSE:

```
y_true = np.hstack([
xy.iloc[subsets.get_test_subset(period=i)]["y"]
for i in range(subsets.n_periods)
])
rmse_b = pd.Series({
key: np.sqrt(np.mean((y_true - y_hat.xs(key)) ** 2))
for key in groups.keys()
})
```

This yields the same RMSE as the sum of the attributions from the RMSE decomposition:

```
>>> rmse_b
rf 1.131310
ols 0.934551
ols+rf 0.989733
```

## Project details

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.