Brownian Bridge interpolation of timeseries, built to use with Pandas.
Project description
simple_interpolation
A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build
std()
.
Interpolation rocks, but doing it poorly can alter the original features of your data. Brownian bridge preserves the volatibility of the original data, if done well. Mixing that with a bit theory on the stock market (Wiener processes), we built a simple interpolation library.
Read about the algorithm in the "Brownian bridge algo" section below.
Install
pip install simple_interpolation
How to use
# Example input dataframe, containing gaps
# (i. e. X column, values 3-5)
df
X | Y | |
---|---|---|
0 | 0 | 8.089846 |
1 | 1 | 11.793489 |
2 | 2 | 9.026726 |
3 | 6 | 8.996177 |
4 | 7 | 11.221730 |
5 | 8 | 8.398122 |
6 | 9 | 8.845667 |
7 | 10 | 11.454700 |
8 | 11 | 11.431745 |
9 | 12 | 7.050733 |
10 | 13 | 10.009420 |
11 | 14 | 6.964674 |
12 | 15 | 9.541557 |
13 | 16 | 11.656722 |
14 | 19 | 11.062303 |
15 | 20 | 11.302763 |
16 | 21 | 13.042057 |
17 | 22 | 7.405670 |
18 | 23 | 8.986057 |
19 | 24 | 7.554964 |
20 | 25 | 10.467688 |
21 | 26 | 9.416683 |
22 | 27 | 10.038665 |
23 | 28 | 5.519665 |
24 | 45 | 10.184922 |
25 | 46 | 11.661662 |
26 | 47 | 9.748401 |
27 | 48 | 11.023116 |
28 | 49 | 9.298167 |
from simple_interpolation import core as si
# Interpolation, plot is optional (default False)
patched_df = si.interpolate_gaps( df , plot = True )
patched_df
No datetime column: assuming first column 'X' as X-axis
std() built with Wiener method
Will interpolate if X-column interval is more than 1.7675
Processed 0.00% of gaps
Ended interpolation, starting plotting the results..
Ended execution
X | Y | interpolated | |
---|---|---|---|
0 | 0.0000 | 8.089846 | 0 |
1 | 1.0000 | 11.793489 | 0 |
2 | 2.0000 | 9.026726 | 0 |
3 | 3.0000 | 8.291588 | 1 |
4 | 4.0000 | 8.486541 | 1 |
5 | 5.0000 | 8.736440 | 1 |
6 | 6.0000 | 8.996177 | 0 |
7 | 7.0000 | 11.221730 | 0 |
8 | 8.0000 | 8.398122 | 0 |
9 | 9.0000 | 8.845667 | 0 |
10 | 10.0000 | 11.454700 | 0 |
11 | 11.0000 | 11.431745 | 0 |
12 | 12.0000 | 7.050733 | 0 |
13 | 13.0000 | 10.009420 | 0 |
14 | 14.0000 | 6.964674 | 0 |
15 | 15.0000 | 9.541557 | 0 |
16 | 16.0000 | 11.656722 | 0 |
17 | 17.5000 | 11.359512 | 1 |
18 | 19.0000 | 11.062303 | 0 |
19 | 20.0000 | 11.302763 | 0 |
20 | 21.0000 | 13.042057 | 0 |
21 | 22.0000 | 7.405670 | 0 |
22 | 23.0000 | 8.986057 | 0 |
23 | 24.0000 | 7.554964 | 0 |
24 | 25.0000 | 10.467688 | 0 |
25 | 26.0000 | 9.416683 | 0 |
26 | 27.0000 | 10.038665 | 0 |
27 | 28.0000 | 5.519665 | 0 |
28 | 29.0625 | 6.584443 | 1 |
29 | 30.1250 | 5.504773 | 1 |
30 | 31.1875 | 5.623875 | 1 |
31 | 32.2500 | 6.275126 | 1 |
32 | 33.3125 | 6.639139 | 1 |
33 | 34.3750 | 6.394277 | 1 |
34 | 35.4375 | 6.797008 | 1 |
35 | 36.5000 | 7.885828 | 1 |
36 | 37.5625 | 8.530594 | 1 |
37 | 38.6250 | 8.921191 | 1 |
38 | 39.6875 | 8.941382 | 1 |
39 | 40.7500 | 8.900565 | 1 |
40 | 41.8125 | 9.037251 | 1 |
41 | 42.8750 | 9.360730 | 1 |
42 | 43.9375 | 9.914641 | 1 |
43 | 45.0000 | 10.184922 | 0 |
44 | 46.0000 | 11.661662 | 0 |
45 | 47.0000 | 9.748401 | 0 |
46 | 48.0000 | 11.023116 | 0 |
47 | 49.0000 | 9.298167 | 0 |
Brownian bridge algo: the theory
To render the equations on browser, install a LaTex rendering extension. Otherwise download it and open it on Jupyer.
Allows to interpolate large gaps preserving volatility of the series (as an input!). Read about it here "Brownian bridge".
Weiner method to obtain the relevant std()
In a Wiener process volatility (variance) is $$var = \Delta_t$$ so $$std = \sqrt{var} = \sqrt{\Delta_t}$$This sets how the local volatility should be analyzed.
So, if we have $std_{year}$ (or $std_{whole series}$), we can get the daily by: $$std_{year} = std_{day} \cdot \sqrt{365} \Rightarrow std_{day} = \frac{std_{year}}{\sqrt{365}}$$
So we can get the "basic building block" of the volatility by getting $std_{minute}$ in our case.
Having $std_{minute}$, we then do a "bottom-up" process building the gap:
$$ std_{gap} = std_{minute} \cdot \sqrt{number_of_mins_in_gap}$$
(Advice from Miguel, my colleague at ING)
Fixed timesteps
You can use
fixed_freq
argument to make the interpolated X points rounded to a certain timestep. 'fixed_freq' timesteps defaults to 'min'. Valid options from Pandas, see link: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
Implementation of the rounding (you probably don't need to read this)
This constraint takes us out of the brownian bridge, because for it we only interpolate the midpoints through:
\begin{cases} x_m = \frac{x_0 + x_1}{2} \ y_m = \frac{y_0 + y_1}{2} + std \end{cases}
But, if we round up to mins, this midpoint $x_m$ could be different than a minute-exact timestamp (imagine the first interpolated point on a gap of 3m: it would be 1.5m). So we round $x_m$, and search for its associated Y displacement $\Delta y$:
\begin{cases} x'm = x_m + \Delta x{toroundtomin} \ y'_m = y_m + \Delta y \end{cases}
To get the associated $\Delta y$ we must use the slope (derivative) at that straight line between points $(x_0, y_0), (x_1, y_1)$.
So:
1- Round up $x_m$ to the nearest minute (lowest, floor()
-like), so we obtain: $x'm$, $\Delta x{toroundtomin}$
2- The deltas on X and Y are related by the derivative, which we are implicitly assuming linear on the brownian bridge, so it's quite straightforward to calculate $\Delta y$:
$$ \Delta y := \frac{dy}{dx} \Delta x \Rightarrow \Delta y \approx \frac{y_1 - y_0}{x_1 - x_0} \Delta x_{toroundtomin} $$
So we would have everything for the Y correction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for simple_interpolation-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b03cbe4c93472ab2dec55256775512a96eea0dea3382ced7cda994e8bbceb00 |
|
MD5 | ebbc5f24357928ec1e54c30209efc0cb |
|
BLAKE2b-256 | 4fdb4ae3fa7f09de3970fb1de9da800b757a55ff40a81a06fa79cbd28f518cbc |
Hashes for simple_interpolation-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a75daebec64bf1bab6d4fdcb9841d0bc583c4cff2e84a16204a3caa370bf9d4 |
|
MD5 | e6bd20c62157f80a4c290576715a3096 |
|
BLAKE2b-256 | 392c19ca8651719cd6da208516d2249d972913feb8bf45caad84475a821218e0 |