Skip to main content

Feature extractor from noisy time series

Project description

light-curve processing toolbox for Python

A Python wrapper for the light-curve-feature and light-curve-dmdt Rust crates, providing high-performance time-series feature extraction for astrophysics.

PyPI version Conda Version Conda Recipe testing pre-commit.ci status

Quick start

pip install 'light-curve[full]'
import light_curve as lc
import numpy as np

rng = np.random.default_rng(0)
n = 100  # observations per light curve

# Observation times in days (unevenly sampled, must be sorted)
t = np.sort(rng.uniform(0, 100, n))
# Magnitudes with a slight linear fade and measurement noise
m = 15.0 + 0.01 * t + rng.normal(0, 0.1, n)
err = np.full(n, 0.1)

# Combine features into a single extractor evaluated in one pass
extractor = lc.Extractor(lc.Amplitude(), lc.BeyondNStd(nstd=1), lc.LinearFit())

result = extractor(t, m, err)
print('\n'.join(f"{name} = {value:.4f}" for name, value in zip(extractor.names, result)))

# Extract a feature from 1000 light curves in parallel
light_curves = [
    (np.sort(rng.uniform(0, 100, n)), 15.0 + rng.normal(0, 0.2, n), np.full(n, 0.1))
    for _ in range(1000)
]
amplitudes = lc.Amplitude().many(light_curves)
print(f"Amplitude: mean = {np.mean(amplitudes):.3f} mag, std = {np.std(amplitudes):.3f} mag")

For conda, alternative package names, and platform-specific installation notes, see the Installation section.

Feature extractors

Most classes implement feature extractors useful for astrophysical source classification and characterization based on light curves.

To list all available feature extractors:

import light_curve as lc

print([x for x in dir(lc) if hasattr(getattr(lc, x), "names")])

To read the documentation for a specific extractor:

import light_curve as lc

help(lc.BazinFit)

Available features

See the complete list of available feature extractors and their documentation in the light-curve-feature Rust crate docs. Italic names are experimental features. While we usually say "magnitude" and use "m" as the time-series value, some features are designed for flux light curves.

The Flux/magnitude column indicates whether a feature should be used with flux light curves only, magnitude light curves only, or either.

The transform=True column shows the default transformation applied to the output when transform=True is passed to the extractor constructor (default is transform=None, which does not apply any transformations). Transformations are monotonic functions intended to reduce dynamic range and improve the suitability of raw feature values for machine learning. You can also select a transformation by name string (e.g. transform="lg") for supported features; available names are "identity" (no-op), "lg" (base-10 logarithm), "arcsinh" (inverse hyperbolic sine), "clipped_lg" (base-10 logarithm with values below the minimum positive float clipped to a floor), "ln1p" (natural logarithm of 1 + x), and "sqrt". For parametric fit features (BazinFit, LinexpFit, VillarFit), transform=True applies a feature-specific transformation described under Parametric fit features. Features marked "N/A" do not implement transformation.

Feature name Description Min data points Features number Flux/magnitude `transform=True`
Amplitude Half amplitude of magnitude:

$\displaystyle \frac{\max (m)-\min (m)}{2}$

1 1 Flux or magn identity
AndersonDarlingNormal Unbiased Anderson–Darling normality test statistic:

$\displaystyle \left( 1+\frac{4}{N} -\frac{25}{N^{2}}\right) \times$

$\times \left( -N-\frac{1}{N}\sum\limits_{i=0}^{N-1} (2i+1)\ln \Phi _{i} +(2(N-i)-1)\ln (1-\Phi _{i} )\right) ,$

where $\Phi _{i\ } \equiv \Phi (( m_{i} \ -\ \langle m\rangle ) /\sigma _{m})$ is the commutative distribution function of the standard normal distribution, $N-$ the number of observations, $\langle m\rangle -$ mean magnitude and $\sigma _{m} =\sqrt{\sum\limits_{i=0}^{N-1}( m_{i} -\langle m\rangle )^{2} /( N-1) \ }$ is the magnitude standard deviation

4 1 Flux or magn lg
BazinFit Five fit parameters and goodness of fit (reduced $\chi ^{2}$) of the Bazin function developed for core-collapsed supernovae:

$\displaystyle f(t)=A\frac{\mathrm{e}^{-(t-t_{0} )/\tau _{fall}}}{1+\mathrm{e}^{-(t-t_{0} )/\tau _{rise}}} +B,$

where $f(t)-$ flux observation
6 1 Flux only $A$ → mag, $B/A$, $t_0$ dropped, $\chi^2$ → $\ln(1+\chi^2)$
BeyondNStd Fraction of observations beyond $n\sigma _{m}$ from the mean magnitude $\langle m\rangle $:

$\displaystyle \frac{\sum _{i} I_{|m-\langle m\rangle | >n\sigma _{m}} (m_{i} )}{N},$

where $I-$ an indicator function
2 1 Flux or magn identity
ColorOfMedian
(experimental)
Magnitude difference between medians of two bands 2 1 Magn only identity
Cusum A range of cumulative sums:

$\displaystyle \max(S) - \min(S),$

where

$\displaystyle S_{j} \equiv \frac{1}{N\sigma_{m}} \sum_{i=0}^{j} (m_{i} - \langle m\rangle), \quad j \in \lbrace 1..N-1 \rbrace\;$

2 1 Flux or magn identity
Eta Von Neummann $\eta $:

$\displaystyle \eta \equiv \frac{1}{(N-1)\sigma _{m}^{2}}\sum\limits _{i=0}^{N-2} (m_{i+1} -m_{i} )^{2}$

2 1 Flux or magn identity
EtaE Modernisation of Eta for unevenly spaced time series:

$\displaystyle \eta ^{e} \equiv \frac{(t_{N-1} -t_{0} )^{2}}{(N-1)^{3}}\frac{\sum\limits_{i=0}^{N-2}\left(\frac{m_{i+1} -m_{i}}{t_{i+1} -t_{i}}\right)^{2}}{\sigma _{m}^{2}}$

2 1 Flux or magn lg
ExcessVariance Measure of the variability amplitude:

$\displaystyle \frac{\sigma _{m}^{2} -\langle \delta ^{2} \rangle }{\langle m\rangle ^{2}},$

where $\langle \delta ^{2} \rangle -$ mean squared error
2 1 Flux only identity
FluxNNotDetBeforeFd
(experimental)
Number of non-detections before the first detection 2 1 Flux only identity
InterPercentileRange

$\displaystyle Q(1-p)-Q(p),$

where $Q(n)$ and $Q(d)-$ $n$-th and $d$-th quantile of magnitude sample
1 1 Flux or magn identity
Kurtosis Excess kurtosis of magnitude:

$\displaystyle \frac{N(N+1)}{(N-1)(N-2)(N-3)}\frac{\sum _{i} (m_{i} -\langle m\rangle )^{4}}{\sigma _{m}^{2}} -3\frac{(N+1)^{2}}{(N-2)(N-3)}$

4 1 Flux or magn arcsinh
LinearFit The slope, its error and reduced $\chi ^{2}$ of the light curve in the linear fit of a magnitude light curve with respect to the observation error $\{\delta _{i}\}$:

$\displaystyle m_{i} \ =\ c\ +\ \text{slope} \ t_{i} \ +\ \delta _{i} \varepsilon _{i} ,$

where $c$ is a constant, $\{\varepsilon _{i}\}$ are standard distributed random variables
3 3 Magn only identity
LinearTrend The slope and its error of the light curve in the linear fit of a magnitude light curve without respect to the observation error $\{\delta _{i}\}$:

$\displaystyle m_{i} \ =\ c\ +\ \text{slope} \ t_{i} \ +\ \Sigma \varepsilon _{i} ,$

where $c$ and $\Sigma$ are constants, $\{\varepsilon _{i}\}$ are standard distributed random variables.
2 2 Magn only identity
MagnitudeNNotDetBeforeFd
(experimental)
Number of non-detections before the first detection 2 1 Magn only identity
MagnitudePercentageRatio Magnitude percentage ratio:

$\displaystyle \frac{Q(1-n)-Q(n)}{Q(1-d)-Q(d)}$

1 1 Flux or magn identity
MaximumSlope Maximum slope between two consecutive observations:

$\displaystyle \max_{i=0\dotsc N-2}\left| \frac{m_{i+1} -m_{i}}{t_{i+1} -t_{i}}\right|$

2 1 Flux or magn clipped_lg
Mean Mean magnitude:

$\displaystyle \langle m\rangle =\frac{1}{N}\sum\limits _{i} m_{i}$

1 1 Flux or magn identity
MeanVariance Standard deviation to mean ratio:

$\displaystyle \frac{\sigma _{m}}{\langle m\rangle }$

2 1 Flux only identity
Median Median magnitude 1 1 Flux or magn identity
MedianAbsoluteDeviation Median of the absolute value of the difference between magnitude and its median:

$\displaystyle \mathrm{Median} (|m_{i} -\mathrm{Median} (m)|)$

1 1 Flux or magn identity
MedianBufferRangePercentage Fraction of points within

$\displaystyle \mathrm{Median} (m)\pm q\times (\max (m)-\min (m))/2$

1 1 Flux or magn identity
OtsuSplit Difference of subset means, standard deviation of the lower subset, standard deviation of the upper subset and lower-to-all observation count ratio for two subsets of magnitudes obtained by Otsu's method split.

Otsu's method is used to perform automatic thresholding. The algorithm returns a single threshold that separates values into two classes. This threshold is determined by minimizing intra-class intensity variance $\sigma^2_{W}=w_0\sigma^2_0+w_1\sigma^2_1$, or equivalently, by maximizing inter-class variance $\sigma^2_{B}=w_0 w_1 (\mu_1-\mu_0)^2$. When multiple extrema exist, the algorithm returns the minimum threshold.
2 4 Flux or magn N/A
PercentAmplitude Maximum deviation of magnitude from its median:

$\displaystyle \max_{i} |m_{i} \ -\ \text{Median}( m) |$

1 1 Flux or magn identity
PercentDifferenceMagnitudePercentile Ratio of $p$-th inter-percentile range to the median:

$\displaystyle \frac{Q( 1-p) -Q( p)}{\text{Median}( m)}$

1 1 Flux only clipped_lg
LinexpFit Four fit parameters and goodness of fit (reduced $\chi^{2}$) of the Linexp function developed for core-collapsed supernovae:

$\displaystyle f(t)=A\frac{t-t_{0}}{\tau}\exp\!\left(\frac{t-t_{0}}{\tau}\right)+B$

5 5 Flux only $A$ → mag, $B/A$, $t_0$ dropped, $\chi^2$ → $\ln(1+\chi^2)$
RainbowFit
(experimental)
Seven fit parameters and goodness of fit (reduced $\chi ^{2}$). The Rainbow method is developed and detailed in Russeil+23. This implementation is suited for transient objects. Bolometric flux and temperature functions are customizable; by default, Bazin and logistic functions are used:

$\displaystyle F_{\nu}(t, \nu) = \frac{\pi\,B\left(T(t),\nu\right)}{\sigma_\mathrm{SB}\,T(t)^{4}} \times F_\mathrm{bol}(t),$

where $F_{\nu}(t, \nu)-$ flux observation at a given wavelength
6 1 Flux only identity
ReducedChi2 Reduced $\chi ^{2}$ of magnitude measurements:

$\displaystyle \frac{1}{N-1}\sum _{i}\left(\frac{m_{i} -\overline{m}}{\delta _{i}}\right)^{2} ,$

where $\overline{m} -$ weighted mean magnitude
2 1 Flux or magn ln1x
Roms
(experimental)
Robust median statistic:

$\displaystyle \frac1{N-1} \sum_{i=0}^{N-1} \frac{|m_i - \mathrm{median}(m_i)|}{\sigma_i}$

2 1 Flux or magn identity
Skew Skewness of magnitude:

$\displaystyle \frac{N}{(N-1)(N-2)}\frac{\sum _{i} (m_{i} -\langle m\rangle )^{3}}{\sigma _{m}^{3}}$

3 1 Flux or magn arcsinh
StandardDeviation Standard deviation of magnitude:

$\displaystyle \sigma _{m} \equiv \sqrt{\sum _{i} (m_{i} -\langle m\rangle )^{2} /(N-1)}$

2 1 Flux or magn identity
StetsonK Stetson K coefficient describing light curve shape:

$\displaystyle \frac{\sum _{i}\left| \frac{m_{i} -\overline{m} }{\delta _{i}}\right| }{\sqrt{N\ \chi ^{2}}}$

2 1 Flux or magn identity
VillarFit Seven fit parameters and goodness of fit (reduced $\chi ^{2}$) of the Villar function developed for supernovae classification:

$f(t)=c+\frac{A}{1+\exp\frac{-(t-t_{0} )}{\tau _{rise}}} \times f_{fall}(t),$

$f_{fall}(t) = 1-\frac{\nu (t-t_{0} )}{\gamma }, ~~~ t< t_{0} +\gamma,$

$f_{fall}(t) = (1-\nu )\exp\frac{-(t-t_{0} -\gamma )}{\tau _{fall}}, ~~~ t \geq t_{0} + \gamma.$

where $f(t) -$ flux observation, $A, \gamma , \tau _{rise} , \tau _{fall} >0$, $\nu \in [0;1)$

Here we introduce a new dimensionless parameter $\nu$ instead of the plateau slope $\beta$ from the original paper: $\nu \equiv -\beta \gamma /A$
8 8 Flux only $A$ → mag, $c/A$, $t_0$ dropped, $\chi^2$ → $\ln(1+\chi^2)$
WeightedMean Weighted mean magnitude:

$\displaystyle \overline{m} \equiv \frac{\sum _{i} m_{i} /\delta _{i}^{2}}{\sum _{i} 1/\delta _{i}^{2}}$

1 1 Flux or magn identity

Meta-features

Meta-features accept other feature extractors and apply them to pre-processed data.

Periodogram

This feature transforms time-series data into the Lomb-Scargle periodogram, providing an estimate of the power spectrum. The peaks argument specifies how many of the most significant spectral density peaks to return. For each peak, its period and signal-to-noise ratio are returned:

$$ \text{signal to noise of peak} \equiv \frac{P(\omega_\mathrm{peak}) - \langle P(\omega) \rangle}{\sigma_{P(\omega)}} $$

The optional features argument accepts a list of additional feature extractors, which are applied to the power spectrum: frequency is passed as "time," power spectrum is passed as "magnitude," and no uncertainties are set.

Bins

Bins the time series into windows of width $\mathrm{window}$ with respect to some $\mathrm{offset}$. The $j$-th bin spans $[j \cdot \mathrm{window} + \mathrm{offset},\ (j + 1) \cdot \mathrm{window} + \mathrm{offset}]$.

The binned time series is defined by $$t_j^* = (j + \frac12) \cdot \mathrm{window} + \mathrm{offset},$$ $$m_j^* = \frac{\sum{m_i / \delta_i^2}}{\sum{\delta_i^{-2}}},$$ $$\delta_j^* = \frac{N_j}{\sum{\delta_i^{-2}}},$$ where $N_j$ is the number of observations in the bin and all sums are over observations within that bin.

Parametric fit features

BazinFit, LinexpFit, and VillarFit fit parametric functions to transient flux light curves (see Available features for the function definitions). Unlike other extractors, they require an explicit algorithm argument selecting the optimization method.

The available algorithms depend on the compile-time Cargo features (see Build from source):

Algorithm Requires Description
"mcmc" always available MCMC ensemble sampler; robust but slow
"ceres" ceres-source or ceres-system Ceres trust-region solver; fast, gradient-based
"mcmc-ceres" ceres-source or ceres-system MCMC exploration followed by Ceres refinement
"lmsder" gsl Levenberg-Marquardt via GSL
"mcmc-lmsder" gsl MCMC exploration followed by LMSDER refinement

The hybrid "mcmc-ceres" and "mcmc-lmsder" algorithms run MCMC first to broadly explore the parameter space, then hand off to the gradient-based solver for precise convergence. This typically gives the best balance of robustness and final accuracy.

By default, initial parameter values and bounds are estimated from the data. Override them with the init and bounds arguments (supported by MCMC-based algorithms only), each a list of values or Nones to keep data-derived defaults for individual parameters. BazinFit has 5 parameters, LinexpFit has 4, VillarFit has 7; run help() on the respective class for the parameter order.

The ln_prior argument sets the MCMC prior. It accepts None / "no" (flat prior, the default), a named string literal, or a list of light_curve.ln_prior.LnPrior1D objects for per-parameter distributions (see help(light_curve.ln_prior) for the available distribution types). VillarFit additionally supports "hosseinzadeh2020", a prior from Hosseinzadeh et al. 2020 that encodes supernova physics constraints; it assumes time values are in days.

Pass transform=True to convert the raw fit parameters to a magnitude-like representation: the amplitude becomes a magnitude (assuming input flux in janskys), the baseline is normalized by the amplitude, the reference time is dropped, and the reduced chi^2 is log-scaled. Run help(lc.BazinFit) for the exact definition.

For the experimental multi-band analogue, see Rainbow Fit below.

Multi-band features

As of v0.8, experimental extractors support multi-band light-curve inputs.

import numpy as np
from light_curve.light_curve_py import LinearFit

t = np.arange(20, dtype=float)
m = np.arange(20, dtype=float)
sigma = np.full_like(t, 0.1)
bands = np.array(["g"] * 10 + ["r"] * 10)
feature = LinearFit(bands=["g", "r"])
values = feature(t, m, sigma, bands)
print(values)

Rainbow Fit

Rainbow (Russeil+23) is a black-body parametric model for transient light curves. By default, it uses the Bazin function as a model for bolometric flux evolution and a logistic function for temperature evolution. The user may customize the model by providing their own functions for bolometric flux and temperature evolution. This example demonstrates the reconstruction of a synthetic light curve with this model. RainbowFit requires the iminuit package.

import numpy as np
from light_curve.light_curve_py import RainbowFit


def bb_nu(wave_aa, T):
    """Black-body spectral model"""
    nu = 3e10 / (wave_aa * 1e-8)
    return 2 * 6.626e-27 * nu ** 3 / 3e10 ** 2 / np.expm1(6.626e-27 * nu / (1.38e-16 * T))


# Effective wavelengths in Angstrom
band_wave_aa = {"g": 4770.0, "r": 6231.0, "i": 7625.0, "z": 9134.0}

# Parameter values
reference_time = 60000.0  # time close to the peak time
# Bolometric flux model parameters
amplitude = 1.0  # bolometric flux semiamplitude, arbitrary (non-spectral) flux/luminosity units
rise_time = 5.0  # exponential growth timescale, days
fall_time = 30.0  # exponential decay timescale, days
# Temperature model parameters
Tmin = 5e3  # temperature on +infinite time, kelvins
delta_T = 10e3  # (Tmin + delta_T) is temperature on -infinite time, kelvins
k_sig = 4.0  # temperature evolution timescale, days

rng = np.random.default_rng(0)
t = np.sort(rng.uniform(reference_time - 3 * rise_time, reference_time + 3 * fall_time, 1000))
band = rng.choice(list(band_wave_aa), size=len(t))
waves = np.array([band_wave_aa[b] for b in band])

# Temperature evolution is a sigmoid function
temp = Tmin + delta_T / (1.0 + np.exp((t - reference_time) / k_sig))
# Bolometric flux evolution is the Bazin function
lum = amplitude * np.exp(-(t - reference_time) / fall_time) / (
        1.0 + np.exp(-(t - reference_time) / rise_time))

# Spectral flux density for each given pair of time and passband
flux = np.pi * bb_nu(waves, temp) / (5.67e-5 * temp ** 4) * lum
# S/N = 5 for minimum flux, scale for Poisson noise
flux_err = np.sqrt(flux * np.min(flux) / 5.0)
flux += rng.normal(0.0, flux_err)

feature = RainbowFit.from_angstrom(band_wave_aa, with_baseline=False)
values = feature(t, flux, sigma=flux_err, band=band)
print(dict(zip(feature.names, values)))
print(f"Goodness of fit: {values[-1]}")

Note that while the data generation above uses approximate physical constant values, RainbowFit uses CODATA 2018 values internally.

Experimental extractors

The package consists of two parts: a wrapper for the light-curve-feature Rust crate (light_curve_ext sub-package) and a pure-Python sub-package light_curve_py. We use the Python implementation to test the Rust implementation and to develop new experimental extractors. The Python implementation is significantly slower for most extractors and does not provide the same full feature set as the Rust implementation, but it does include some extractors you may find useful.

You can use extractors from either implementation directly:

import numpy as np
from numpy.testing import assert_allclose
from light_curve.light_curve_ext import LinearTrend as RustLinearTrend
from light_curve.light_curve_py import LinearTrend as PythonLinearTrend

rust_fe = RustLinearTrend()
py_fe = PythonLinearTrend()

n = 100
t = np.sort(np.random.normal(size=n))
m = 3.14 * t - 2.16 + np.random.normal(size=n)

assert_allclose(rust_fe(t, m), py_fe(t, m),
                err_msg="Python and Rust implementations must provide the same result")

This should print a warning about the experimental status of the Python class.

Performance

The package is designed for high throughput. The following techniques help extract features with minimal overhead.

sorted and check parameters

The sorted=True argument tells the extractor that t is already sorted in ascending order, and check=False disables validation of NaN/inf values. The defaults are sorted=None (the array will be validated and an error raised if unsorted) and check=True. Passing invalid inputs without validation can cause incorrect results or crashes.

Batch processing with .many()

Each extractor has a .many() method that processes a collection of light curves in parallel across all available CPU cores by default. It accepts either a list of (t, m[, sigma]) tuples or an Arrow array (see below).

import light_curve as lc
import numpy as np

rng = np.random.default_rng(0)
# A list of (t, m, sigma) tuples, one per light curve
light_curves = [
    (np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)
    for _ in range(1000)
]

results = lc.Amplitude().many(light_curves)

Arrow input

The .many() method also accepts Arrow arrays instead of a list of tuples, enabling zero-copy data access from Arrow-compatible libraries. The input must be a List<Struct<t, m[, sigma]>> array where all fields share the same float dtype. Any library implementing the Arrow C Data Interface is supported. All light curves are processed in parallel using Rust threads, bypassing the GIL entirely.

nested-pandas

nested-pandas extends pandas with nested column support, useful for working with catalog data like ZTF or Rubin LSST. Install with pip install nested-pandas s3fs universal-pathlib.

import light_curve as lc
import nested_pandas as npd
import numpy as np
import pyarrow as pa
from upath import UPath

# Read a ZTF DR23 HiPSCat partition with nested light curves
s3_path = UPath(
    "s3://ipac-irsa-ztf/contributed/dr23/lc/hats/ztf_dr23_lc-hats/dataset/Norder=6/Dir=30000/Npix=34623/part0.snappy.parquet",
    anon=True,
)
ndf = npd.read_parquet(
    s3_path,
    columns=["objectid", "lightcurve.hmjd", "lightcurve.mag", "lightcurve.magerr"],
)

# Select rows with more than 10 observations
ndf = ndf.loc[ndf["lightcurve"].list_lengths > 10]

# We must have subcolumns to be the same dtype and be in t, m, sigma order, so we rename and convert time.
# Create a float32 time column and drop the original double-precision one
ndf["lightcurve.t"] = np.asarray(ndf["lightcurve.hmjd"] - 58000, dtype=np.float32)
ndf["lightcurve"] = ndf[["lightcurve.t", "lightcurve.mag", "lightcurve.magerr"]]

# Extract features directly from the Arrow-backed nested column
feature = lc.Extractor(lc.Amplitude(), lc.EtaE(), lc.InterPercentileRange(quantile=0.25))
result = feature.many(pa.array(ndf["lightcurve"]), n_jobs=-1)

# Assign features back to the dataframe
ndf = ndf.assign(**dict(zip(feature.names, result.T)))
print(ndf.head())

PyArrow

PyArrow is the reference Python implementation of Apache Arrow. Install with pip install pyarrow.

import light_curve as lc
import numpy as np
import pyarrow as pa

rng = np.random.default_rng(42)
n_lc = 10

# Build a PyArrow List<Struct<t, m, sigma>> array
struct_type = pa.struct([("t", pa.float64()), ("m", pa.float64()), ("sigma", pa.float64())])
lcs_arrow = pa.array(
    [
        [{"t": t, "m": m, "sigma": s} for t, m, s in zip(*[np.sort(rng.random(50)) for _ in range(3)])]
        for _ in range(n_lc)
    ],
    type=pa.list_(struct_type),
)

feature = lc.Extractor(lc.Kurtosis(), lc.Skew(), lc.ReducedChi2())
result = feature.many(lcs_arrow, sorted=True, check=False, n_jobs=4)
print(f"Features: {feature.names}")
print(f"Results shape: {result.shape}")

Polars

Polars is a fast DataFrame library built on Arrow. Install with pip install polars.

import light_curve as lc
import numpy as np
import polars as pl

rng = np.random.default_rng(42)

# Start with a flat DataFrame, as you might get from a database or CSV
object_id = np.repeat(np.arange(10), 50)
t = np.sort(rng.random(500))
m = rng.random(500)
sigma = rng.random(500)

df = pl.DataFrame({"object_id": object_id, "t": t, "m": m, "sigma": sigma})

# Group by object_id into List<Struct<t, m, sigma>>
nested = df.group_by("object_id").agg(pl.struct("t", "m", "sigma").alias("lc"))

feature = lc.Extractor(lc.Amplitude(), lc.BeyondNStd(nstd=2), lc.LinearFit(), lc.StetsonK())
result = feature.many(nested["lc"], sorted=True, check=False, n_jobs=1)

# Join feature columns back to the nested DataFrame
nested = nested.with_columns(
    [pl.Series(name, result[:, i]) for i, name in enumerate(feature.names)]
)
print(nested)

Benchmarks

Run all benchmarks from the Python project folder with python3 -mpytest --benchmark-enable tests/test_w_bench.py, or with slow benchmarks disabled: python3 -mpytest -m "not (nobs or multi)" --benchmark-enable tests/test_w_bench.py.

Below we benchmark the Rust implementation (rust) against the feets package and our own Python implementation (lc_py) for a light curve with n=1000 observations.

Benchmarks, Rust is much faster

The Rust implementation outperforms the alternatives by a factor of 1.5–50. This allows extracting a large set of "cheap" features in well under one millisecond for n=1000. The performance of parametric fits (BazinFit and VillarFit) and Periodogram depends on their parameters, but the typical extraction time including these features is 20–50 ms for a few hundred observations.

Benchmark for different number of observations

Benchmark results for both the pure-Python and Rust implementations as a function of the number of observations. Both axes are on a logarithmic scale.

Benchmark for multithreading and multiprocessing

Processing time per light curve for the feature subset from the first benchmark, as a function of the number of CPU cores used. The dataset consists of 10,000 light curves with 1,000 observations each.

See the benchmarks described in more detail in "Performant feature extraction for photometric time series".

dm-dt map

In addition to the feature extractors above, the package provides a separate dm–dt mapping tool. The dm–dt map is a 2-D histogram of magnitude differences (dm) versus time differences (dt) for all pairs of observations in a light curve. It is commonly used as an input representation for machine learning classifiers rather than as a scalar feature.

The DmDt class (a Python wrapper for the light-curve-dmdt Rust crate) implements this mapper, following Mahabal et al. 2011 and Soraisam et al. 2020.

import numpy as np
from light_curve import DmDt
from numpy.testing import assert_array_equal

dmdt = DmDt.from_borders(min_lgdt=0, max_lgdt=np.log10(3), max_abs_dm=3, lgdt_size=2, dm_size=4, norm=[])

t = np.array([0, 1, 2], dtype=np.float32)
m = np.array([0, 1, 2], dtype=np.float32)

desired = np.array(
    [
        [0, 0, 2, 0],
        [0, 0, 0, 1],
    ]
)
actual = dmdt.points(t, m)

assert_array_equal(actual, desired)

Installation

pip install 'light-curve[full]'
# or
conda install -c conda-forge light-curve-python

The full extra installs all optional Python dependencies required by experimental features. We also provide a light-curve-python package as an alias for light-curve[full].

The minimum supported Python version is 3.10. Binary CPython distributions are available via PyPI and Anaconda for various platforms and architectures. On PyPI, we publish wheels for the stable CPython ABI, ensuring compatibility with future CPython 3 versions.

Support matrix

Arch \ OS Linux glibc 2.17+ Linux musl 1.2+ macOS Windows https://github.com/light-curve/light-curve-python/issues/186
x86-64 PyPI (MKL), conda PyPI (MKL) PyPI macOS 15+, conda PyPI, conda (both no Ceres, no GSL)
i686 src src not tested
aarch64 PyPI PyPI PyPI macOS 14+, conda not tested
ppc64le src not tested (no Rust toolchain)
  • PyPI / conda: A binary wheel or package is available on pypi.org or anaconda.org. Local building is not required; the only prerequisite is a recent version of pip or conda. For Linux x86-64, PyPI wheels are built with Intel MKL for improved periodogram performance, which is not the default build option. For Windows x86-64, all distributions exclude Ceres and GSL support.
  • src: The package has been confirmed to build and pass unit tests locally, but CI does not test or publish packages for this platform. See the Build from source section for details. Please open an issue or pull request if you encounter build problems or would like us to distribute packages for these platforms.
  • not tested: Building from source has not been tested. Please report build status via issue, PR, or email.

macOS wheels require relatively recent OS versions; please open an issue if you need support for older macOS versions (see https://github.com/light-curve/light-curve-python/issues/376 for details).

We no longer publish PyPy wheels (#345) or the PPC64le CPython glibc wheel (#479); please open an issue if you need either.

Free-threaded Python (experimental in Python 3.13, officially supported in Python 3.14+) is supported when built from source. No pre-built distributions are currently provided; please comment on the relevant issues if you need them: PyPI binary wheel issue, conda-forge package issue. Note that for expensive features, the GIL-enabled interpreter with .many() achieves performance on par with the free-threaded interpreter using Python threads. For inexpensive extractors, .many() still reduces Rust–Python interaction overhead significantly.

Developer guide

Prepare environment

Install a recent Rust toolchain and Python 3.10 or higher. It is recommended to use rustup to install the Rust toolchain and keep it updated with rustup update.

Clone the repository, then create and activate a virtual environment:

git clone https://github.com/light-curve/light-curve-python.git
cd light-curve-python/light-curve
python3 -m venv venv
source venv/bin/activate

Install the package in editable mode (see Build from source for more details):

python -mpip install maturin
# --release takes longer to build but produces a faster package
# Add other Cargo flags if needed, e.g. --no-default-features --features=ceres-source
maturin develop --extras=dev

On subsequent runs, activate the environment with source venv/bin/activate and rebuild Rust code with maturin develop. Python-only changes require no rebuild. The --extras=dev flag is only needed once to install development dependencies.

It is also recommended to install pre-commit hooks to keep the codebase clean. Install via pip (see the documentation for other options), then run:

pre-commit install

Run tests and benchmarks

All test dependencies are installed with --extras=dev. Run the tests with:

python -mpytest

Benchmarks are disabled by default; enable them with --benchmark-enable:

python -mpytest --benchmark-enable

See the Benchmarks section in Performance for more details.

Build from source

Dependencies and Cargo features

The package has a number of compile-time Cargo features, mostly controlling which C/C++ dependencies are used. Pass the desired features to maturin with --features; it is also recommended to use --no-default-features to avoid building unnecessary dependencies.

Available features:

  • abi3 (default) — enables CPython ABI3 compatibility. Disable for other interpreters or if you have a specific reason (our benchmarks show no performance difference). ABI3 is not supported by free-threaded CPython or PyPy.
  • ceres-source (default) — builds Ceres solver from source. Requires a C++ compiler and CMake. Known to not work on Windows. Used as an optional optimization backend for BazinFit and VillarFit.
  • ceres-system — links against a system-installed Ceres dynamic library instead of building from source.
  • mkl — enables the FFTW interface with the Intel MKL backend for the fast periodogram. Intel MKL is downloaded automatically during the build. Highly recommended for Intel CPUs. Without this feature, the pure-Rust RustFFT backend is used.
  • gsl (default) — enables GNU Scientific Library support. Requires a compatible GSL version installed on your system. Used as an optional optimization backend for BazinFit and VillarFit.
  • mimalloc (default) — enables the mimalloc memory allocator. Benchmarks show up to a 2× speedup for some cheap features, though it may increase memory usage.

Build with maturin

maturin is the recommended tool for building Rust-backed Python packages. This example builds with minimal system dependencies:

python -mpip install maturin
maturin build --release --locked --no-default-features --features=abi3,mimalloc

The --release flag enables release mode (slower build, faster execution), --locked ensures reproducible builds, and --no-default-features --features=abi3,mimalloc selects only ABI3 compatibility and the mimalloc allocator.

Build with build

You can also build the package with build:

python -mpip install build
MATURIN_PEP517_ARGS="--locked --no-default-features --features=abi3,mimalloc" python -m build

Build with cibuildwheel

cibuildwheel builds wheels on CI servers; we use it with GitHub Actions. You can also use it locally to build wheels for your platform (replace the identifier with one from the list of supported platforms):

python -mpip install cibuildwheel
python -m cibuildwheel --only=cp310-manylinux_x86_64

Note that different Cargo feature sets are used for different platforms, as defined in pyproject.toml. Windows wheels must be built on Windows; Linux wheels can be built on any platform with Docker (Qemu may be needed for cross-architecture builds); macOS wheels must be built on macOS. macOS builds also require MACOSX_DEPLOYMENT_TARGET to match the current macOS version, since Homebrew dependencies are built against it:

export MACOSX_DEPLOYMENT_TARGET=$(sw_vers -productVersion | awk -F '.' '{print $1"."0}')
python -m cibuildwheel --only=cp310-macosx_arm64
unset MACOSX_DEPLOYMENT_TARGET

Since we use ABI3 compatibility, a wheel built for CPython 3.10 works with any later CPython 3 version.

Citation

If you found this project useful for your research, please cite Malanchev et al., 2021:

@ARTICLE{2021MNRAS.502.5147M,
       author = {{Malanchev}, K.~L. and {Pruzhinskaya}, M.~V. and {Korolev}, V.~S. and {Aleo}, P.~D. and {Kornilov}, M.~V. and {Ishida}, E.~E.~O. and {Krushinsky}, V.~V. and {Mondon}, F. and {Sreejith}, S. and {Volnova}, A.~A. and {Belinski}, A.~A. and {Dodin}, A.~V. and {Tatarnikov}, A.~M. and {Zheltoukhov}, S.~G. and {(The SNAD Team)}},
        title = "{Anomaly detection in the Zwicky Transient Facility DR3}",
      journal = {\mnras},
     keywords = {methods: data analysis, astronomical data bases: miscellaneous, stars: variables: general, Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Solar and Stellar Astrophysics},
         year = 2021,
        month = apr,
       volume = {502},
       number = {4},
        pages = {5147-5175},
          doi = {10.1093/mnras/stab316},
archivePrefix = {arXiv},
       eprint = {2012.01419},
 primaryClass = {astro-ph.IM},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2021MNRAS.502.5147M},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

light_curve-0.12.0.tar.gz (445.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

light_curve-0.12.0-cp310-abi3-win_amd64.whl (5.5 MB view details)

Uploaded CPython 3.10+Windows x86-64

light_curve-0.12.0-cp310-abi3-musllinux_1_2_x86_64.whl (31.7 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

light_curve-0.12.0-cp310-abi3-musllinux_1_2_aarch64.whl (7.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

light_curve-0.12.0-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (45.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

light_curve-0.12.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

light_curve-0.12.0-cp310-abi3-macosx_15_0_x86_64.whl (21.6 MB view details)

Uploaded CPython 3.10+macOS 15.0+ x86-64

light_curve-0.12.0-cp310-abi3-macosx_14_0_arm64.whl (17.6 MB view details)

Uploaded CPython 3.10+macOS 14.0+ ARM64

File details

Details for the file light_curve-0.12.0.tar.gz.

File metadata

  • Download URL: light_curve-0.12.0.tar.gz
  • Upload date:
  • Size: 445.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for light_curve-0.12.0.tar.gz
Algorithm Hash digest
SHA256 bc33025018d75a560e713279868721a89a4028aae41bbf79e0e04637829cb1b3
MD5 a3f13f5db858b384d9deaab127f86082
BLAKE2b-256 23e3c06722b40f1441bb493d81b966b3969651bc4c6e57f5fa61040a8b740f56

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 54d928780a147962eeaa4caa0bb0d67c95a306db70a7066b7e2753d9fe804d65
MD5 7bcf80758143127141592dcf1dd5aa56
BLAKE2b-256 5ab6e5b7db03518c62a3d0f57b57c6e1c998831a409727bbb1da8fcde1333307

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d7e9501a1e5eaddcbd145ea1988de8da1f5d35c10f03ed416ea6d3c794b73170
MD5 a67001a1521df4fb85aec3419d479977
BLAKE2b-256 c108d21e019c82275b7f86c863df32e0c8ed28bfd7c82f6f66dff3a0dfb2d92d

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 11b3351e9e44c151cb96fcfc80068202011f44a11b19033f6234267dacd7a87d
MD5 a629c6fa58a5d68026c2e6bd8a768147
BLAKE2b-256 c409c590dcfec458723ca0e077360736555b8ebd697affa944598b56f1b0639a

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 dbda887c9b6ca1896bf9d0310db31adc3776f429c3ca2633f6ed67bf56a6acfe
MD5 e86182743c5eec704084cba039c7ffd6
BLAKE2b-256 0fdc9d5f7de0b81fa9dedbc20e3e89d40a0098f2dc4e7cca198e974c944be19d

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 43c393ff05405dc5d8938a50346ee2038143f9ccf6d4a7594fb17abe74cb551c
MD5 450b708b229324296af478122d8e957c
BLAKE2b-256 55dd8b5159ae80dad566673418eed29a190ee69210d48bdea2c0ca94e4308b9b

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-macosx_15_0_x86_64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-macosx_15_0_x86_64.whl
Algorithm Hash digest
SHA256 ed9d12689395454d1eed0f600185cfaeb0514c2a1349d892e178c5504a5914b6
MD5 929713ccf8e8582e1080d71e376f288a
BLAKE2b-256 a6f820ece6747d9a773f832866ac4d50f7b7bfe15b4d61caf310451ec439a88b

See more details on using hashes here.

File details

Details for the file light_curve-0.12.0-cp310-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for light_curve-0.12.0-cp310-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 513a46afc2b7d6c154066b490347c91990ac031f6c47de5b646dbeb278d5f08d
MD5 392223f7a4838b10f63dfaf7709aa263
BLAKE2b-256 79153a639fd96454ae59577d354d6ff532048dfe582aabc88ad4b50ca90f9496

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page