a bayesian data augmentation algorithm
Project description
BayesDawn
BayesDawn stands for Bayesian Data Augmentation for Waves and Noise. It implements an iterative Bayesian augmentation method to handle data gaps in gravitational-wave data analysis, as described in this paper: https://arxiv.org/abs/1907.04747.
Installation
BayesDawn can be installed by unzipping the source code in one directory and using this command: ::
sudo python setup.py install
Quick start
Using BayesDawn for your own analysis will essentially involve the datamodel.py module, which allows you to compute the conditional distribution of missing values given the observed values of a time series. Here is a working example that can be used.
- Generation of test data
To begin with, we generate some simple time series which contains noise and signal. To generate the noise, we start with a white, zero-mean Gaussian noise that we then filter to obtain a stationary colored noise:
# Import bayesdawn and other useful packages
from bayesdawn import datamodel, psdmodel
import numpy as np
import random
from scipy import signal
# Choose size of data
n_data = 2**14
# Set sampling frequency
fs = 1.0
# Generate Gaussian white noise
noise = np.random.normal(loc=0.0, scale=1.0, size = n_data)
# Apply filtering to turn it into colored noise
r = 0.01
b, a = signal.butter(3, 0.1/0.5, btype='high', analog=False)
n = signal.lfilter(b,a, noise, axis=-1, zi=None) + noise*r
Then we need a deterministic signal to add. We choose a sinusoid with some frequency f0 and amplitude a0:
t = np.arange(0, n_data) / fs
f0 = 1e-2
a0 = 5e-3
s = a0 * np.sin(2 * np.pi * f0 * t)
The noisy data is then
y = s + n
- Introduction of data gaps
Now assume that some data are missing, i.e. the time series is cut by random gaps. The pattern is represented by a mask vector with entries equal to 1 when data is observed, and 0 otherwise:
mask = np.ones(n_data)
n_gaps = 30
gapstarts = (n_data * np.random.random(n_gaps)).astype(int)
gaplength = 10
gapends = (gapstarts+gaplength).astype(int)
for k in range(n_gaps): mask[gapstarts[k]:gapends[k]]= 0
Therefore, we do not observe the data y but its masked version, mask*y:
y_masked = mask * y
- Missing data imputation
Assuming that we know exactly the deterministic signal, we can do a crude estimation of the PSD from masked data:
# Fit PSD with a spline of degree 2 and 10 knots
psd_cls = psdmodel.PSDSpline(n_data, fs,
n_knots=10,
d=2,
fmin=fs/n_data,
fmax=fs/2)
psd_cls.estimate(mask * (y - s))
Then, from the observed data and their model, we can reconstruct the missing data using the imputation package:
# instantiate imputation class
imp_cls = datamodel.GaussianStationaryProcess(s, mask, psd_cls,
na=50, nb=50)
# perform offline computations
imp_cls.compute_offline()
# Imputation of missing data
y_rec = imp_cls.draw_missing_data(y_masked)
Documentation
Please refer to the documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bayesdawn-0.1.0.tar.gz
.
File metadata
- Download URL: bayesdawn-0.1.0.tar.gz
- Upload date:
- Size: 503.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d1ab111addf5426a291ba10f38080ea9f9bd2af8de701ef0a522d3643b2845b |
|
MD5 | d4951b7968884c271d03ae4652a96f78 |
|
BLAKE2b-256 | 1dca14ce97e656978c345277ce5dbf957c878f138d12a011041790d844011102 |
File details
Details for the file bayesdawn-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: bayesdawn-0.1.0-py3-none-any.whl
- Upload date:
- Size: 97.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4af9c6feb421fe90f5102ca87eefa9c8d0838a956707139f4f5a0a904bf5d8a1 |
|
MD5 | 4c82543a35b222cd2283dd92e4b1c87e |
|
BLAKE2b-256 | 46f2768a67dfba50449b1f3edfa48242ca2b0417d6fcd741f0f205b120f8bcfc |