Skip to main content

Brownian Bridge interpolation of timeseries, built to use with Pandas.

Project description

simple_interpolation

A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build std().

Interpolation rocks, but doing it poorly can alter the original features of your data. Brownian bridge preserves the volatibility of the original data, if done well. Mixing that with a bit theory on the stock market (Wiener processes), we built a simple interpolation library.

Read about the algorithm in the "Brownian bridge algo" section below.

Install

pip install simple_interpolation

How to use

# Example input dataframe, containing gaps
#  (i. e. X column, values 3-5)
df
X Y
0 0 8.089846
1 1 11.793489
2 2 9.026726
3 6 8.996177
4 7 11.221730
5 8 8.398122
6 9 8.845667
7 10 11.454700
8 11 11.431745
9 12 7.050733
10 13 10.009420
11 14 6.964674
12 15 9.541557
13 16 11.656722
14 19 11.062303
15 20 11.302763
16 21 13.042057
17 22 7.405670
18 23 8.986057
19 24 7.554964
20 25 10.467688
21 26 9.416683
22 27 10.038665
23 28 5.519665
24 45 10.184922
25 46 11.661662
26 47 9.748401
27 48 11.023116
28 49 9.298167
import simple_interpolation.simple_interpolation as si

or

from simple_interpolation import core as si

Interpolation, plot is optional (default False)

patched_df = si.interpolate_gaps( df , plot = True )
patched_df
No datetime column: assuming first column 'X' as X-axis
std() built with Wiener method
Will interpolate if X-column interval is more than 1.7675
Processed 0.00% of gaps
Ended interpolation, starting plotting the results..

png

X Y interpolated
0 0.0000 8.089846 0
1 1.0000 11.793489 0
2 2.0000 9.026726 0
3 3.0000 8.291588 1
4 4.0000 8.486541 1
5 5.0000 8.736440 1
6 6.0000 8.996177 0
7 7.0000 11.221730 0
8 8.0000 8.398122 0
9 9.0000 8.845667 0
10 10.0000 11.454700 0
11 11.0000 11.431745 0
12 12.0000 7.050733 0
13 13.0000 10.009420 0
14 14.0000 6.964674 0
15 15.0000 9.541557 0
16 16.0000 11.656722 0
17 17.5000 11.359512 1
18 19.0000 11.062303 0
19 20.0000 11.302763 0
20 21.0000 13.042057 0
21 22.0000 7.405670 0
22 23.0000 8.986057 0
23 24.0000 7.554964 0
24 25.0000 10.467688 0
25 26.0000 9.416683 0
26 27.0000 10.038665 0
27 28.0000 5.519665 0
28 29.0625 6.584443 1
29 30.1250 5.504773 1
30 31.1875 5.623875 1
31 32.2500 6.275126 1
32 33.3125 6.639139 1
33 34.3750 6.394277 1
34 35.4375 6.797008 1
35 36.5000 7.885828 1
36 37.5625 8.530594 1
37 38.6250 8.921191 1
38 39.6875 8.941382 1
39 40.7500 8.900565 1
40 41.8125 9.037251 1
41 42.8750 9.360730 1
42 43.9375 9.914641 1
43 45.0000 10.184922 0
44 46.0000 11.661662 0
45 47.0000 9.748401 0
46 48.0000 11.023116 0
47 49.0000 9.298167 0

Effects of interpolation on seasonality

You can check the effects of the interpolation or gap addendum on the seasonalities of the signal. The next method uses Facebook's Prophet models (you can install via pip install fbprophet) on 4 different versions of your data:

  • your original data, df,
  • your data interpolated, df_i,
  • your data with new gaps added, df_emptied,
  • the emptied data then interpolated, df_emptied_i.
import simple_interpolation.prophet_sensibility as ps

# This also plots the thing below
models = ps.prophet_sensibility(df, slicing_threshold=5000)

prophet_sensibility_example

Brownian bridge algo: the theory

To render the equations on browser, install a LaTex rendering extension. Otherwise download it and open it on Jupyer.

Allows to interpolate large gaps preserving volatility of the series (as an input!). Read about it here "Brownian bridge".

Weiner method to obtain the relevant std()

In a Wiener process volatility (variance) is $$var = \Delta_t$$ so $$std = \sqrt{var} = \sqrt{\Delta_t}$$This sets how the local volatility should be analyzed.

So, if we have $std_{year}$ (or $std_{whole series}$), we can get the daily by: $$std_{year} = std_{day} \cdot \sqrt{365} \Rightarrow std_{day} = \frac{std_{year}}{\sqrt{365}}$$

So we can get the "basic building block" of the volatility by getting $std_{minute}$ in our case.

Having $std_{minute}$, we then do a "bottom-up" process building the gap:

$$ std_{gap} = std_{minute} \cdot \sqrt{number_of_mins_in_gap}$$

(Advice from Miguel, my colleague at ING)

Fixed timesteps

You can use fixed_freq argument to make the interpolated X points rounded to a certain timestep. 'fixed_freq' timesteps defaults to None (not fixed freq). Valid options from Pandas, see link: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases


Implementation of the rounding (you probably don't need to read this)

This constraint takes us out of the brownian bridge, because for it we only interpolate the midpoints through:

\begin{cases} x_m = \frac{x_0 + x_1}{2} \ y_m = \frac{y_0 + y_1}{2} + std \end{cases}

But, if we round up to mins, this midpoint $x_m$ could be different than a minute-exact timestamp (imagine the first interpolated point on a gap of 3m: it would be 1.5m). So we round $x_m$, and search for its associated Y displacement $\Delta y$:

\begin{cases} x'm = x_m + \Delta x{toroundtomin} \ y'_m = y_m + \Delta y \end{cases}

To get the associated $\Delta y$ we must use the slope (derivative) at that straight line between points $(x_0, y_0), (x_1, y_1)$.

So:

1- Round up $x_m$ to the nearest minute (lowest, floor()-like), so we obtain: $x'm$, $\Delta x{toroundtomin}$

2- The deltas on X and Y are related by the derivative, which we are implicitly assuming linear on the brownian bridge, so it's quite straightforward to calculate $\Delta y$:

$$ \Delta y := \frac{dy}{dx} \Delta x \Rightarrow \Delta y \approx \frac{y_1 - y_0}{x_1 - x_0} \Delta x_{toroundtomin} $$

So we would have everything for the Y correction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_interpolation-0.1.15.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_interpolation-0.1.15-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file simple_interpolation-0.1.15.tar.gz.

File metadata

  • Download URL: simple_interpolation-0.1.15.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for simple_interpolation-0.1.15.tar.gz
Algorithm Hash digest
SHA256 9dae57ff8a49f227bf3a4929f225b2c2ada9456bf2e49d475c0e9eef97b226a4
MD5 ad54c9296e58caa664ad2f886f55a87d
BLAKE2b-256 ac676eaca2591a7d2f83063425c38bcc2225a5c0d54e5d974cc10f2396a6c845

See more details on using hashes here.

File details

Details for the file simple_interpolation-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: simple_interpolation-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for simple_interpolation-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 ab01059af19597dfac1d11b5d17e50a3cc50c0e174ae2dfd6a5c4d1195bc0095
MD5 0f58bfd92c85fa2cbdce9639ec6db243
BLAKE2b-256 db2eb8bc22caa7fa07869418316c3f68e486afaa68ba3b4cbde40a4307e6b52d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page