Skip to main content

Python module used for robust estimation of cross-corelation functions of sparse and unevenly sampled astronomical time-series. It emulates a Fortran program called ZDCF which is based on the Z-transformed Discrete Correlation Function algorithm (ZDCF, Alexander 1997).

Project description

pyZDCF

DOI pypi Documentation Status

pyZDCF is a Python module that emulates a widely used Fortran program called ZDCF (Z-transformed Discrete Correlation Function, Alexander 1997). It is used for robust estimation of cross-correlation function of sparse and unevenly sampled astronomical time-series. This Python implementation also introduces sparse matrices in order to significantly reduce RAM usage when running the code on large time-series (> 3000 points).

pyZDCF is based on the original Fortran code fully developed by Prof. Tal Alexander from Weizmann Institute of Science, Israel (see Acknowledgements and References for details and further reading).

Full docs with examples: pyzdcf.readthedocs.io

Motivation

Development of pyZDCF module was motivated by the long and successful usage of the original ZDCF Fortran code in the analysis of light curves of active galactic nuclei by our research group (see Kovacevic et al. 2014, Shapovalova et al. 2019, and reference therein). One of the science cases we investigate is photometric reverberation mapping in the context of Legacy Survey of Space and Time (LSST) survey strategies (see Jankov et al. 2022). However, this module is general and is meant to be used for cross-correlation of spectroscopic or photometric light curves, same as the original Fortran version.

Installation

pyZDCF can be installed using pip:

pip install pyzdcf

Dependencies

python = ">=3.9,<3.14"
numpy = "^1.21.6"
pandas = "^2.2.3"
scipy = "^1.11"

Older versions of pyzdcf support older versions of Python, see table for details:

pyzdcf Version Supported Python Versions Notes
>= 1.0.3 3.9 – 3.13 Requires newer dependencies (e.g. pandas ≥2.2.3)
<= 1.0.2 3.8 – 3.11 Compatible with older environments

How to use

Input files

This code requires user-provided plain text files as input. CSV files are accepted by default, but you can use any other delimited file, as long as you provide the sep keyword argument when calling pyzdcf function. The input light curve file should be in 3 columns: time (ordered), flux/magnitude and absolute error on flux/magnitude. Make sure to exclude the header (column names) from the input files.

First few lines of the example input file accepted by default (CSV):

0.0,0.9594479339474323,0.0019188958678948648
1.0,0.9588196871078336,0.0019176393742156672
2.0,0.9637198686651904,0.0019274397373303808
3.0,0.9622807967282166,0.0019245615934564328

NOTE: pyZDCF is tested only with input files having whole numbers (integers) for time column. If you have decimal numbers (e.g., you have a light curve with several measurments in the same night expressed as fractions of a day instead of minutes), just convert them into a time format with integer values (e.g., minutes instead of days). On the other hand, you could round the values from the same day (e.g. 5.6 --> 5, 5.8 --> 5, etc.) and the algorithm will take in the information and average the flux for that day.

Input parameters

If you use interactive mode (intr = True), then pyZDCF will ask you to enter all input parametars interactively, similarly to original ZDCF interface. There is also a manual mode (intr = False) where you can provide input parameters using a dictionary and passing it to parameters keyword argument.

Available input parameters (keys in the parameters dictionary) are:

  • autocf - if True, perform auto-correlation, otherwise do the cross-correlation.
  • prefix - provide a name for the output file.
  • uniform_sampling - if True, set flag to perform uniform sampling of the light curve.
  • omit_zero_lags - if True, omit zero lag points.
  • minpts - minimal number of points per bin.
  • num_MC - number of Monte Carlo simulations for error estimation.
  • lc1_name - Name of the first light curve file
  • lc2_name - Name of the second light curve file (required only if we do cross-correlation)

For more information on the correct syntax, see "Running the code" subsection.

Output

The return value of the pyzdcf function is a pandas.DataFrame object displaying the results in 7 columns:

+---+-------+-------------+-------------+--------------+-------------+-------------+------+
|   |   tau |   -sig(tau) |   +sig(tau) |          dcf |   -err(dcf) |   +err(dcf) | #bin |
|---+-------+-------------+-------------+--------------+-------------+-------------+------|
| 0 |  -991 |           4 |           0 |  0.13598     |  0.361559   |  0.342224   |   10 |
| 1 |  -988 |           2 |           0 | -0.217733    |  0.279988   |  0.301034   |   13 |
| 2 |  -985 |           2 |           0 | -0.0614938   |  0.266546   |  0.27135    |   16 |
| 3 |  -982 |           2 |           0 |  0.239601    |  0.237615   |  0.223317   |   19 |
| 4 |  -979 |           2 |           0 |  0.331415    |  0.208171   |  0.192523   |   22 |

The columns are: time-lag, negative time-lag std, positive time-lag std, zdcf, negative zdcf sampling error, positive zdcf sampling error, number of points per bin. For more information on how these values are calculated see Alexander 1997.

The code will also generate an output .dcf file file in a specified folder on your computer with same 7 columns containing the results. It is allowed to name these files however you want using the prefix parameter (see example in the next subsection).

Optionally, by adding keyword argument savelc = True, pyzdcf can create and save light curve files used as input after averaging points with identical times.

Running the code

An example for calculating cross-correlation between two light curves:

from pyzdcf import pyzdcf

input = './input/'           # Path to the input data
output = './output/'         # Path to the directory for saving the results

# Light curve names
lc1 = 'lc_name1'
lc2 = 'lc_name2'

# Parameters are passed to the pyZDCF as a dictionary

params = dict(autocf            =  False, # Autocorrelation (T) or cross-correlation (F)
              prefix            = 'ccf',  # Output files prefix
              uniform_sampling  =  False, # Uniform sampling?
              omit_zero_lags    =  True,  # Omit zero lag points?
              minpts            =  0,     # Min. num. of points per bin (0 is a flag for default value of 11)
              num_MC            =  100,   # Num. of Monte Carlo simulations for error estimation
              lc1_name          =  lc1,   # Name of the first light curve file
              lc2_name          =  lc2    # Name of the second light curve file (required only if we do CCF)
             )

# Here we use non-interactive mode (intr=False)
dcf_df = pyzdcf(input_dir  = input, 
                output_dir = output, 
                intr       = False, 
                parameters = params, 
                sep        = ',', 
                sparse     = 'auto', 
                verbose    = True)

# To run the program in interactive mode (like the original Fortran code):
dcf_df = pyzdcf(input_dir  = input, 
                output_dir = output, 
                intr       = True, 
                sep        = ',', 
                sparse     = 'auto', 
                verbose    = True
                )

  • For more examples see example notebook.

  • Additionally, you can also check out code description of the original Fortran version because the majority of input parameters and all output files are the same as in pyZDCF. You can download the fortran source code here.

Features

  • Sparse matrix implementation for reduced RAM usage when working with long light curves (>3000 points);

The main benefit is that we can now run these demanding calculations on our own personal computers (8 GB of RAM is enough for light curves containing up to 15000 points), making the usage of this algorithm more convinient than ever.

You can turn this on/off by specifying sparse keyword argument to True or False. Default value is 'auto', where sparse marices are utilized when there are more than 3000 points per light curve. Note that by reducing RAM usage, we pay in increased program running time.

  • Interactive mode: program specifically asks the user to provide necessary parameters (similar to original Fortran version);
  • Manual mode: user can provide all parameters in one dictionary.
  • Fixed bugs from original ZDCF (v2.3) written in Fortran 95.

The module was tested (i.e., compared with original ZDCF v2.3 output, with fixed bugs) for various parameter combinations on a set of 100 AGN light curve candidates (g and r bands). The list of object ids and coordinates was taken from a combined catalogue of known AGNs (Sánchez-Sáez et al. 2021).

License

Distributed under the MIT License.

Contact

Isidora Jankov (main) - ijankov@proton.me
Andjelka Kovačević - andjelka@matf.bg.ac.rs
Dragana Ilić - dilic@matf.bg.ac.rs

You are welcome to write to us:

  • if there are any problems running the code on your system;
  • suggestions for code improvements.

If you want to report a bug, please open an Issue on GitHub: https://github.com/LSST-sersag/pyzdcf.

Citation

If you use pyZDCF for scientific work leading to a publication, please consider acknowledging it using the following citation (BibTeX):

@software{jankov_isidora_2022_7253034,
  author       = {Jankov, Isidora and
                  Kovačević, Andjelka B. and
                  Ilić, Dragana and
                  Sánchez-Sáez, Paula and
                  Nikutta, Robert},
  title        = {pyZDCF: Initial Release},
  month        = oct,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.7253034},
  url          = {https://doi.org/10.5281/zenodo.7253034}
}

For other citation formats see: https://doi.org/10.5281/zenodo.7253034

Acknowledgments

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyzdcf-1.0.3.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyzdcf-1.0.3-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file pyzdcf-1.0.3.tar.gz.

File metadata

  • Download URL: pyzdcf-1.0.3.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.5 Linux/6.8.0-60-generic

File hashes

Hashes for pyzdcf-1.0.3.tar.gz
Algorithm Hash digest
SHA256 667ba45c1258afc6a8e14cd859e6ad932c0c9a5f13019dcacb69561d2db6f51f
MD5 a25939db1b7d2e3838a478694a035e29
BLAKE2b-256 c06583c6caabd7e43f8750966ce3844f39b244cf00971607e9ff311bc7f96710

See more details on using hashes here.

File details

Details for the file pyzdcf-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: pyzdcf-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.5 Linux/6.8.0-60-generic

File hashes

Hashes for pyzdcf-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 aaa7b3cbb7b1d4b5a1ab5ea698104739d3e94fe1668d145977a4fb1fd5cdb2cb
MD5 ec340ee668ad323dbfd68b2f32e0f4d8
BLAKE2b-256 78dd5e2026b591d2946ecc9f2a9475c9da2ce08e9178acaf92a31aa3128ec224

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page