Brownian Bridge interpolation of timeseries, built to use with Pandas.
Project description
simple_interpolation
A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build
std()
.
Interpolation rocks, but doing it poorly can alter the original features of your data. Brownian bridge preserves the volatibility of the original data, if done well. Mixing that with a bit theory on the stock market (Wiener processes), we built a simple interpolation library.
Read about the algorithm in the "Brownian bridge algo" section below.
Install
pip install simple_interpolation
How to use
# Example input dataframe, containing gaps
# (i. e. X column, values 3-5)
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
X | Y | |
---|---|---|
0 | 0 | 8.089846 |
1 | 1 | 11.793489 |
2 | 2 | 9.026726 |
3 | 6 | 8.996177 |
4 | 7 | 11.221730 |
5 | 8 | 8.398122 |
6 | 9 | 8.845667 |
7 | 10 | 11.454700 |
8 | 11 | 11.431745 |
9 | 12 | 7.050733 |
10 | 13 | 10.009420 |
11 | 14 | 6.964674 |
12 | 15 | 9.541557 |
13 | 16 | 11.656722 |
14 | 19 | 11.062303 |
15 | 20 | 11.302763 |
16 | 21 | 13.042057 |
17 | 22 | 7.405670 |
18 | 23 | 8.986057 |
19 | 24 | 7.554964 |
20 | 25 | 10.467688 |
21 | 26 | 9.416683 |
22 | 27 | 10.038665 |
23 | 28 | 5.519665 |
24 | 45 | 10.184922 |
25 | 46 | 11.661662 |
26 | 47 | 9.748401 |
27 | 48 | 11.023116 |
28 | 49 | 9.298167 |
# Interpolation, plot is optional (default False)
patched_df = interpolate_gaps( df , plot = True )
patched_df
No datetime column: assuming first column 'X' as X-axis
std() built with Wiener method
Will interpolate if X-column interval is more than 1.7675
Processed 0.00% of gaps
Ended interpolation, starting plotting the results..
Ended execution
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
X | Y | interpolated | |
---|---|---|---|
0 | 0.0000 | 8.089846 | 0 |
1 | 1.0000 | 11.793489 | 0 |
2 | 2.0000 | 9.026726 | 0 |
3 | 3.0000 | 8.291588 | 1 |
4 | 4.0000 | 8.486541 | 1 |
5 | 5.0000 | 8.736440 | 1 |
6 | 6.0000 | 8.996177 | 0 |
7 | 7.0000 | 11.221730 | 0 |
8 | 8.0000 | 8.398122 | 0 |
9 | 9.0000 | 8.845667 | 0 |
10 | 10.0000 | 11.454700 | 0 |
11 | 11.0000 | 11.431745 | 0 |
12 | 12.0000 | 7.050733 | 0 |
13 | 13.0000 | 10.009420 | 0 |
14 | 14.0000 | 6.964674 | 0 |
15 | 15.0000 | 9.541557 | 0 |
16 | 16.0000 | 11.656722 | 0 |
17 | 17.5000 | 11.359512 | 1 |
18 | 19.0000 | 11.062303 | 0 |
19 | 20.0000 | 11.302763 | 0 |
20 | 21.0000 | 13.042057 | 0 |
21 | 22.0000 | 7.405670 | 0 |
22 | 23.0000 | 8.986057 | 0 |
23 | 24.0000 | 7.554964 | 0 |
24 | 25.0000 | 10.467688 | 0 |
25 | 26.0000 | 9.416683 | 0 |
26 | 27.0000 | 10.038665 | 0 |
27 | 28.0000 | 5.519665 | 0 |
28 | 29.0625 | 6.584443 | 1 |
29 | 30.1250 | 5.504773 | 1 |
30 | 31.1875 | 5.623875 | 1 |
31 | 32.2500 | 6.275126 | 1 |
32 | 33.3125 | 6.639139 | 1 |
33 | 34.3750 | 6.394277 | 1 |
34 | 35.4375 | 6.797008 | 1 |
35 | 36.5000 | 7.885828 | 1 |
36 | 37.5625 | 8.530594 | 1 |
37 | 38.6250 | 8.921191 | 1 |
38 | 39.6875 | 8.941382 | 1 |
39 | 40.7500 | 8.900565 | 1 |
40 | 41.8125 | 9.037251 | 1 |
41 | 42.8750 | 9.360730 | 1 |
42 | 43.9375 | 9.914641 | 1 |
43 | 45.0000 | 10.184922 | 0 |
44 | 46.0000 | 11.661662 | 0 |
45 | 47.0000 | 9.748401 | 0 |
46 | 48.0000 | 11.023116 | 0 |
47 | 49.0000 | 9.298167 | 0 |
Brownian bridge algo: the theory
Allows to interpolate large gaps preserving volatility of the series (as an input!). Read about it here "Brownian bridge".
Weiner method to obtain the relevant std()
In a Wiener process volatility (variance) is $$var = \Delta_t$$ so $$std = \sqrt{var} = \sqrt{\Delta_t}$$This sets how the local volatility should be analyzed.
So, if we have $std_{year}$ (or $std_{whole series}$), we can get the daily by: $$std_{year} = std_{day} \cdot \sqrt{365} \Rightarrow std_{day} = \frac{std_{year}}{\sqrt{365}}$$
So we can get the "basic building block" of the volatility by getting $std_{minute}$ in our case.
Having $std_{minute}$, we then do a "bottom-up" process building the gap:
$$ std_{gap} = std_{minute} \cdot \sqrt{number_of_mins_in_gap}$$
(Advice from Miguel, my colleague at ING)
Fixed timesteps
You can use
fixed_freq
argument to make the interpolated X points rounded to a certain timestep. 'fixed_freq' timesteps defaults to 'min'. Valid options from Pandas, see link: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
Implementation of the rounding (you probably don't need to read this)
This constraint takes us out of the brownian bridge, because for it we only interpolate the midpoints through:
\begin{cases} x_m = \frac{x_0 + x_1}{2} \ y_m = \frac{y_0 + y_1}{2} + std \end{cases}
But, if we round up to mins, this midpoint $x_m$ could be different than a minute-exact timestamp (imagine the first interpolated point on a gap of 3m: it would be 1.5m). So we round $x_m$, and search for its associated Y displacement $\Delta y$:
\begin{cases} x'm = x_m + \Delta x{toroundtomin} \ y'_m = y_m + \Delta y \end{cases}
To get the associated $\Delta y$ we must use the slope (derivative) at that straight line between points $(x_0, y_0), (x_1, y_1)$.
So:
1- Round up $x_m$ to the nearest minute (lowest, floor()
-like), so we obtain: $x'm$, $\Delta x{toroundtomin}$
2- The deltas on X and Y are related by the derivative, which we are implicitly assuming linear on the brownian bridge, so it's quite straightforward to calculate $\Delta y$:
$$ \Delta y := \frac{dy}{dx} \Delta x \Rightarrow \Delta y \approx \frac{y_1 - y_0}{x_1 - x_0} \Delta x_{toroundtomin} $$
So we would have everything for the Y correction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for simple_interpolation-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 146bed3e9ff0d0abd59923ce218257fee7446157213e59842654b1d2d2a88b75 |
|
MD5 | 55a516a000445d122d2c9dccc9f8b040 |
|
BLAKE2b-256 | 9c123a8669898e385e72ff5d3e6da360c7ab3453b8f16c64a57a0f0cc615fa10 |
Hashes for simple_interpolation-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76c5711566495c9b610a4533a12a7b87fc7cfaa1f5ff1a06e3d7b7b1195e6e5c |
|
MD5 | 72a6b6b984141dc40c81ed5815bb5a1a |
|
BLAKE2b-256 | bb97435a653f1aaf0503f5ef59e8141c87c1075facc5ff0ffb44f8b25328c0ae |