Skip to main content

Brownian Bridge interpolation of timeseries, built to use with Pandas.

Project description

simple_interpolation

A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build std().

Interpolation rocks, but doing it poorly can alter the original features of your data. Brownian bridge preserves the volatibility of the original data, if done well. Mixing that with a bit theory on the stock market (Wiener processes), we built a simple interpolation library.

Read about the algorithm in the "Brownian bridge algo" section below.

Install

pip install simple_interpolation

How to use

# Example input dataframe, containing gaps
#  (i. e. X column, values 3-5)
df
X Y
0 0 8.089846
1 1 11.793489
2 2 9.026726
3 6 8.996177
4 7 11.221730
5 8 8.398122
6 9 8.845667
7 10 11.454700
8 11 11.431745
9 12 7.050733
10 13 10.009420
11 14 6.964674
12 15 9.541557
13 16 11.656722
14 19 11.062303
15 20 11.302763
16 21 13.042057
17 22 7.405670
18 23 8.986057
19 24 7.554964
20 25 10.467688
21 26 9.416683
22 27 10.038665
23 28 5.519665
24 45 10.184922
25 46 11.661662
26 47 9.748401
27 48 11.023116
28 49 9.298167
from simple_interpolation import core as si

# Interpolation, plot is optional (default False)
patched_df = si.interpolate_gaps( df , plot = True )
patched_df
No datetime column: assuming first column 'X' as X-axis
std() built with Wiener method
Will interpolate if X-column interval is more than 1.7675
Processed 0.00% of gaps
Ended interpolation, starting plotting the results..

png

Ended execution
X Y interpolated
0 0.0000 8.089846 0
1 1.0000 11.793489 0
2 2.0000 9.026726 0
3 3.0000 8.291588 1
4 4.0000 8.486541 1
5 5.0000 8.736440 1
6 6.0000 8.996177 0
7 7.0000 11.221730 0
8 8.0000 8.398122 0
9 9.0000 8.845667 0
10 10.0000 11.454700 0
11 11.0000 11.431745 0
12 12.0000 7.050733 0
13 13.0000 10.009420 0
14 14.0000 6.964674 0
15 15.0000 9.541557 0
16 16.0000 11.656722 0
17 17.5000 11.359512 1
18 19.0000 11.062303 0
19 20.0000 11.302763 0
20 21.0000 13.042057 0
21 22.0000 7.405670 0
22 23.0000 8.986057 0
23 24.0000 7.554964 0
24 25.0000 10.467688 0
25 26.0000 9.416683 0
26 27.0000 10.038665 0
27 28.0000 5.519665 0
28 29.0625 6.584443 1
29 30.1250 5.504773 1
30 31.1875 5.623875 1
31 32.2500 6.275126 1
32 33.3125 6.639139 1
33 34.3750 6.394277 1
34 35.4375 6.797008 1
35 36.5000 7.885828 1
36 37.5625 8.530594 1
37 38.6250 8.921191 1
38 39.6875 8.941382 1
39 40.7500 8.900565 1
40 41.8125 9.037251 1
41 42.8750 9.360730 1
42 43.9375 9.914641 1
43 45.0000 10.184922 0
44 46.0000 11.661662 0
45 47.0000 9.748401 0
46 48.0000 11.023116 0
47 49.0000 9.298167 0

Brownian bridge algo: the theory

To render the equations on browser, install a LaTex rendering extension. Otherwise download it and open it on Jupyer.

Allows to interpolate large gaps preserving volatility of the series (as an input!). Read about it here "Brownian bridge".

Weiner method to obtain the relevant std()

In a Wiener process volatility (variance) is $$var = \Delta_t$$ so $$std = \sqrt{var} = \sqrt{\Delta_t}$$This sets how the local volatility should be analyzed.

So, if we have $std_{year}$ (or $std_{whole series}$), we can get the daily by: $$std_{year} = std_{day} \cdot \sqrt{365} \Rightarrow std_{day} = \frac{std_{year}}{\sqrt{365}}$$

So we can get the "basic building block" of the volatility by getting $std_{minute}$ in our case.

Having $std_{minute}$, we then do a "bottom-up" process building the gap:

$$ std_{gap} = std_{minute} \cdot \sqrt{number_of_mins_in_gap}$$

(Advice from Miguel, my colleague at ING)

Fixed timesteps

You can use fixed_freq argument to make the interpolated X points rounded to a certain timestep. 'fixed_freq' timesteps defaults to 'min'. Valid options from Pandas, see link: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases


Implementation of the rounding (you probably don't need to read this)

This constraint takes us out of the brownian bridge, because for it we only interpolate the midpoints through:

\begin{cases} x_m = \frac{x_0 + x_1}{2} \ y_m = \frac{y_0 + y_1}{2} + std \end{cases}

But, if we round up to mins, this midpoint $x_m$ could be different than a minute-exact timestamp (imagine the first interpolated point on a gap of 3m: it would be 1.5m). So we round $x_m$, and search for its associated Y displacement $\Delta y$:

\begin{cases} x'm = x_m + \Delta x{toroundtomin} \ y'_m = y_m + \Delta y \end{cases}

To get the associated $\Delta y$ we must use the slope (derivative) at that straight line between points $(x_0, y_0), (x_1, y_1)$.

So:

1- Round up $x_m$ to the nearest minute (lowest, floor()-like), so we obtain: $x'm$, $\Delta x{toroundtomin}$

2- The deltas on X and Y are related by the derivative, which we are implicitly assuming linear on the brownian bridge, so it's quite straightforward to calculate $\Delta y$:

$$ \Delta y := \frac{dy}{dx} \Delta x \Rightarrow \Delta y \approx \frac{y_1 - y_0}{x_1 - x_0} \Delta x_{toroundtomin} $$

So we would have everything for the Y correction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_interpolation-0.0.9.tar.gz (1.4 MB view hashes)

Uploaded Source

Built Distribution

simple_interpolation-0.0.9-py3-none-any.whl (17.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page