A python package for conducting Lomb-Scargle based prewhitening of stellar time series.
Project description
A flexible python package for conducting Lomb-Scargle-based pre-whitening of time series data. It is distinguished from other, similar tools by a focus on automated pipeline-style analysis and a high degree of flexibility through its extensive configuration system.
Written by Erik Stacey
Last updated 12 Feb 2025. Please note that this package is not generally actively developed or maintained at this time. You can still send me an email, but I might not have time to address your issue(s).
Installation
pywhiten is available through pip:
pip install pywhiten
Or, alternatively, download and install directly from this repo:
git clone https://github.com/erikstacey/pywhiten.git ./pywhiten
cd pywhiten
pip install .
Documentation
The general documentation is available here, and a getting started guide is available here or in the next section.
Getting Started
Pywhiten was designed to be easy to get up and running quickly out of the box, so let us walk through a tutorial example.
Setting up a tutorial directory
First, lets create a tutorial directory somewhere:
mkdir pywhiten_tutorial
cd pywhiten_tutorial
Then, grab an example set of time series data here and place it in a tutorial directory. Alternatively, you can provide your own time series data noting that the import examples may differ depending on your data format.
Importing data and setting up a Pywhitener
To start pre-whitening, we need to load our time series data. But first, we need to create a python script. Here's an example with Vim, however you can use whatever you usually use to write python code:
vim example.py
Now, writing in our example script, we'll load our time series data into three arrays from the example file:
import numpy
time, data, err = np.loadtxt("HD47129_thresh9_lc.txt", unpack=True)
The Pywhitener class provides an easy-to-use interface to the rest of the package and powerful functionality for automated or semi-automated frequency analysis. Lets make one here by passing in the time series data we just loaded in:
pywhitener = pywhiten.PyWhitener(time=time, data=data, err=err)
Note that if you don't provide an error array, it will assume the data to be equally weighted.
Automatic Pre-Whitening
Now, running an automatic frequency analysis is as simple as calling the auto method:
pywhitener.auto()
This will automatically proceed through several iterations and, if using the sample data, will terminate after the 16th iteration after achieving the termination criterion. This will also create a pw_out directory in the working directory and populate it with results. If you want to have a quick peek at the results, have a look at pw_out/frequencies.csv in your shell:
more ./pw_out/frequencies.csv
Semi-automatic Pre-Whitening
If you want to proceed in single iterations only, you can do that with the it_pw() method of the pywhitener. Here's an example:
import numpy
time, data, err = np.loadtxt("HD47129_thresh9_lc.txt", unpack=True)
pywhitener = pywhiten.PyWhitener(time=time, data=data, err=err)
pywhitener.it_pw()
This will identify a single frequency. You can run this ten times to get ten frequencies:
for i in range(10):
pywhitener.it_pw()
Or, if you'd like, you can manually specify the frequency/amplitude/phase hints. This will obviously skip the peak selection phase:
pywhitener.it_pw_manual(frequency = 0.5, amplitude = 12.5, phase = 0.2)
You can output this data to the pw_out directory using
pywhitener.post_pw()
Now, this example has so far used the default configuration for the program. Pywhiten derives its flexibility from its configuration files, which make it possible to automate batch running frequency analyses on data from different instruments with different configuration requirements. More details on the configuration can be found here
Directly Accessing Data and Results
If you'd like to directly access the results, that's easily accomplished through the attributes of the pywhitener.
Light Curves
Light curves are stored in a list in the lcs attribute. Here's an example of accessing the residual lightcurve of the last iteration and plotting the results:
import matplotlib.pyplot as pl
residual_lc = pywhitener.lcs[-1]
pl.plot(residual_lc.time, residual_lc.data)
pl.show()
Refer to the documentation for what can be achieved using the Lightcurve objects.
Periodograms
As periodograms are a description of the frequency spectrum of a time series, they are stored within the Lightcurve objects. Here's an example of accessing and plotting the residual periodogram:
import matplotlib.pyplot as pl
residual_lc = pywhitener.lcs[-1]
residual_periodogram = residual_lc.periodogram
pl.plot(residual_periodogram.lsfreq, residual_periodogram.lsamp)
pl.show()
Frequencies
Unlike light curves, Frequency objects are stored in their own special container. A list of frequency objects can be acquired through a method of the FrequencyContainer object:
frequencies_list = pywhitener.freqs.get_flist()
print(f"The last identified frequency is at {frequencies_list[-1].f:.5f} with an amplitude of {frequencies_list[-1].a:.5f}!)
All of the Frequency, FrequencyContainer, Lightcurve, and Periodogram objects have useful functionality which is documented in the Data module of the docs!
Methodology and Implementation
This package is principally designed for automated or semi-automated frequency analysis. More specifically, a type of pre-whitening analysis is conducted iteratively to quantitatively identify sinusoidal signals present in time series data (for time series with non-sinusoidal signals, a technique like Phase Dispersion Minimization may be more suitable; see phmin).
Basics of Pre-Whitening
This type of frequency analysis can be broadly defined by each iteration consisting of the following steps:
- Compute an amplitude spectrum for the time series under examination
- Identify a frequency/amplitude of interest
- Perform a least squares optimization of a sinusoidal model to the time series, using the frequency/amplitude from step 1 as initial parameters
- This will typically be a model of the form Asin(2pi*(f*x+p)), where f, A, and p represent a frequency, amplitude, and phase. This fully specifies a single-frequency sinusoidal model.
- Subtract the optimized model from the time series to generate a residual time series
- This process can be repeated on the residual time series to identify another frequency
Following these steps will result in an optimized sinusoidal model for a single frequency, and a time series with that model removed. By conducting this process many times, all the measurable periodic signals (including non-sinusoidal signals) can be extracted leaving, ideally, a time series consisting only of the underlying noise.
Motivation and Methodology of Pywhiten
When identifying a frequency model using a least-squares fit, the presence of other frequencies can make it more difficult to identify the model parameters (similar to how statistical fluctuations in data introduce uncertainty in the parameters for models fit to that data). In a basic pre-whitening analysis, the residuals from one iteration are directly used in the next iteration to identify a new frequency and then generate new residuals. Therefore, small fluctuations in model parameters will be propagated through each iteration and can significantly affect measurements only a handful of iterations deep. This effect is particularly pronounce when there is combination of very high-amplitude and comparatively low-amplitude signals present, and can completely obscure the detection of the low-amplitude signals.
Pywhiten was developed for a significant research project which focused on a comprehensive characterization of the photometric variability of a magnetic massive stellar binary system (HD 47129, Plaskett's Star). This system demonstrates significant photometric variability due to the presence of a co-rotating centrifugal magnetosphere (CM) around the magnetic star, which manifests as a high-amplitude harmonic structure in its power spectrum. Simultaneously, it has several comparatively low-amplitude signals which are of scientific interest, made more challenging to detect by their proximity (in frequency) to the CM variability. Therefore, the problem we sought to address was the detection of low-amplitude signals in time series with complex, dense power spectra dominated by a small number of high-amplitude frequencies.
An effective approach to the aforementioned problem was found in performing optimizations of a composite model containing all the identified single-frequency sinusoidal models. Performing this at the end of a basic pre-whitening procedure is prone to falling into a local chi^2 minimum, so we elected to include an aggressive refinement step at each iteration. This step occurs after each single-frequency model fit and permits all frequencies/amplitudes/phases to vary in a fit to the original light curve. Therefore, each Pywhiten iteration does the following:
- Computes a Lomb-Scargle periodogram using the AstroPy.timeseries package,
- Identifies a candidate frequency/amplitude.
- Performs a single-frequency model optimization to the light curve with the candidate frequency/amplitude used as initial values (and a provisional phase of 0.5).
- Adds the optimized single-frequency model to the complete variability model, consisting of all identified frequencies. Allows all parameters of this model to vary and perform an optimization against the original (non-residual) light curve.
- Subtracts the complete variability model from the original light curve to generate a residual light curve. This is passed to the next iteration and used to determine a new single-frequency model.
Step (2) also has some special behaviour worth noting:
- For the first n iterations, where n was typically taken to be 10, the candidate frequencies/amplitudes were measured as simple the highest peak on the periodogram.
- For iterations after the nth, the candidate frequencies/amplitudes were measured as the highest peak in the periodogram that also exceeded a provisional significance criterion (3 sigma), measured by performing a fit of a red+white noise model to the periodogram.
- If no values in the periodogram exceed the provisional significance criterion, the analysis is concluded.
However, this is just the default behaviour based on the method presented in this thesis, and is the recommended method for conducting pre-whitening on stellar time series data. This package has been made to be reasonably flexible, and everything from the basic pre-whitening described prior and this process are possible.
Drawbacks
The advantages of this method have been discussed in the motivation section above, but it's important to note that the additional complexity of this process has two significant drawbacks which may preclude its suitability for some applications:
- With the addition of each single-frequency model, the multi-frequency model gains 3 parameters. Therefore, a 50-frequency model has a minimum of 150 free parameters, some of which may be partially corrolated.
- This approach works best when working with 30 or less frequencies. Computation time inflates significantly after .
- Allowing the frequencies to vary in the multi-frequency fit can cause the optimization to fail. Pywhiten implements two safeguards against this, which are generally effective at preventing anomalous results or runtime failures:
- New frequencies are not selected within 1.5/T of existing frequencies, where T is the total time baseline of the dataset. This is the (empirical) minimum separation between two frequencies necessary to make reliable independent measurements of each of them.
- As the multi-frequency fit is intended as a refinement step, where parameters shouldn't change significantly, frequencies and amplitudes are bounded by default to a small region around their initial values.
Issues
While the core functionality of this software was tested via application with my own scientific research, the conversion to a Python package occurred after this was complete and shortly after I left the field of astronomy. If you run into any bugs with the software, but particularly related to the installation of the package, please open an issue or email me directly at erik@erikstacey.com. Your diligence is much appreciated!
Change Log:
-
1.1.6 (Apr 9, 2025)
- Bounded x0 for SLF fits to periodograms to be positive
-
1.1.5 (Feb 12, 2025)
- Fixed a bug with setting the periodogram upper limit to "Nyquist".
-
1.1.4 (Feb 10, 2025)
- Added a method for skipping peak selection using a manually specified frequency/amplitude/phase hint.
-
1.1.3 (Jan 8, 2024)
- Changed name of auxiliary folder in default configuration
-
1.1.2 (Nov 1, 2023)
- Fixed a bug relating to the importing of the default configuration file when instantiating the Pywhitener class.
- Added a warning if post_pw is called with an empty frequency list
-
1.1.1 (April 21, 2023)
- Made t0 of Frequency object an optional parameter
-
1.1.0 (April 21, 2023)
- Fixed major bug with installion scripts while using pip where submodules would not be properly installed.
- Moved the default configuration loading and access to new module
cfg
. Default configuration can now be accessed throughpywhiten.cfg.default_cfg
.
-
1.0.3 (April 5, 2023)
- First non-internal release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pywhiten-1.1.6.tar.gz
.
File metadata
- Download URL: pywhiten-1.1.6.tar.gz
- Upload date:
- Size: 37.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70b46530ce1208666ff70068d8df0cb1da7fd08b8f3e9fcbd46f65cbf50ea6ce |
|
MD5 | 91f41d4fcd5f121533b248944d9aff3d |
|
BLAKE2b-256 | 99510abdb5001b1f67252b6cdf24ab7ee630691a08e8d99cc47ef5f56d1edd89 |
File details
Details for the file pywhiten-1.1.6-py3-none-any.whl
.
File metadata
- Download URL: pywhiten-1.1.6-py3-none-any.whl
- Upload date:
- Size: 39.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cb6038898489dc911bb9dc998b957dd239d7679e11e95870c1652f90e6a7f64 |
|
MD5 | 3f8c07d4b07d6d83c6dab7af93929e20 |
|
BLAKE2b-256 | 2537b98fadd81ac6b0a07a453fd2e14fa2f3c396b1c1e730dd2a76e0945c36df |