CPU backends for periodfind period-finding algorithms
Project description
PeriodFind
A collection of CUDA-accelerated periodicity detection algorithms, with both C++ and Python APIs. Includes a Rust-based CPU backend for environments without GPU hardware.
Algorithms
Period-Finding
| Algorithm | Unified API | GPU (CUDA) | CPU (Rust) |
|---|---|---|---|
| Conditional Entropy | periodfind.ConditionalEntropy |
periodfind.gpu.ConditionalEntropy |
periodfind.cpu.ConditionalEntropy |
| Analysis of Variance | periodfind.AOV |
periodfind.gpu.AOV |
periodfind.cpu.AOV |
| Lomb-Scargle | periodfind.LombScargle |
periodfind.gpu.LombScargle |
periodfind.cpu.LombScargle |
| Fast Phase-folding Weighted | periodfind.FPW |
periodfind.gpu.FPW |
periodfind.cpu.FPW |
| Box Least Squares | periodfind.BoxLeastSquares |
periodfind.gpu.BoxLeastSquares |
periodfind.cpu.BoxLeastSquares |
Feature Extraction
| Algorithm | Unified API | CPU (Rust) |
|---|---|---|
| Fourier Decomposition | periodfind.FourierDecomposition |
periodfind.cpu.FourierDecomposition |
Fourier decomposition computes weighted linear least-squares Fourier fits with BIC model selection (0-5 harmonics) for a batch of light curves given pre-determined periods. Returns 14 features per curve: [power, BIC, offset, slope, A1, B1, A2, B2, A3, B3, A4, B4, A5, B5]. This replaces the per-source scipy.optimize.curve_fit approach with a direct Cholesky solve, giving identical results orders of magnitude faster.
Device API
Periodfind provides a PyTorch-style device abstraction so you can write device-agnostic code. When no device is set, it auto-detects GPU availability (tries to import the CUDA extensions and runs nvidia-smi).
import periodfind
# Set the global default device
periodfind.set_device('cpu') # or 'gpu'
print(periodfind.get_device()) # 'cpu'
# Factory functions dispatch to the right backend
ce = periodfind.ConditionalEntropy(n_phase=10, n_mag=10)
aov = periodfind.AOV(n_phase=15)
ls = periodfind.LombScargle()
fpw = periodfind.FPW(n_bins=10)
bls = periodfind.BoxLeastSquares(n_bins=50, qmin=0.01, qmax=0.5)
fd = periodfind.FourierDecomposition() # CPU-only for now
# Per-call override (ignores the global default)
ce_gpu = periodfind.ConditionalEntropy(n_phase=10, n_mag=10, device='gpu')
You can still import backends directly:
from periodfind.gpu import ConditionalEntropy # CUDA backend
from periodfind.cpu import ConditionalEntropy # Rust CPU backend
from periodfind.cpu import FourierDecomposition # Rust CPU only
Box Least Squares Usage
BLS searches for periodic box-shaped (flat-bottom) transit dips in time-series data (Kovacs, Zucker & Mazeh 2002). It is particularly well-suited for detecting eclipsing binaries and transiting exoplanets.
import numpy as np
import periodfind
bls = periodfind.BoxLeastSquares(
n_bins=50, # number of phase bins
qmin=0.01, # minimum transit duration (fraction of period)
qmax=0.5, # maximum transit duration (fraction of period)
)
# times, mags: lists of float32 arrays (one per light curve)
# errs: optional list of float32 uncertainty arrays
periods = np.linspace(0.5, 10.0, 5000, dtype=np.float32)
period_dts = np.array([0.0], dtype=np.float32)
# Get best-period statistics
stats = bls.calc(times, mags, periods, period_dts, errs=errs, output="stats")
print(stats[0].params[0]) # detected period
# Get full periodogram
pgrams = bls.calc(times, mags, periods, period_dts, output="periodogram")
# Get top-N peaks (memory-efficient for large grids)
peaks = bls.calc(times, mags, periods, period_dts, output="peaks", n_peaks=32)
Fourier Decomposition Usage
import numpy as np
import periodfind
fd = periodfind.FourierDecomposition()
# times, mags, errs: lists of float32 arrays (one per light curve)
# periods: float32 array with one period per curve
features = fd.calc(times, mags, errs, periods)
# features.shape == (n_curves, 14)
Throughput Benchmarks
Measured on a batch of 100 light curves over 1000 trial periods (single period_dt). CPU = Rust/Rayon (28 cores, Skylake Xeon); GPU = NVIDIA Tesla P100 (12 GB). Times are median of 3 runs after warmup.
Throughput table (points/sec)
| pts/curve | Backend | CE | AOV | LS | FPW | BLS |
|---|---|---|---|---|---|---|
| 256 | CPU | 140K | 184K | 146K | 245K | 121K |
| 256 | 1x P100 | 1.1M | 1.1M | 1.2M | 1.1M | 1.0M |
| 256 | 2x P100 | 1.1M | 1.2M | 1.4M | 1.2M | 1.2M |
| 1024 | CPU | 176K | 211K | 181K | 290K | 228K |
| 1024 | 1x P100 | 3.8M | 3.1M | 4.5M | 2.7M | 3.2M |
| 1024 | 2x P100 | 4.1M | 3.9M | 5.1M | 3.6M | 4.1M |
| 4096 | CPU | 185K | 217K | 194K | 307K | 293K |
| 4096 | 1x P100 | 9.8M | 3.2M | 13.2M | 3.5M | 6.2M |
| 4096 | 2x P100 | 12.7M | 5.6M | 16.5M | 6.1M | 9.6M |
| 8192 | CPU | 186K | 219K | 199K | 309K | 307K |
| 8192 | 1x P100 | 13.7M | 3.7M | 19.8M | 5.6M | 5.5M |
| 8192 | 2x P100 | 19.6M | 6.8M | 27.6M | 9.9M | 9.8M |
GPU kernels use a hybrid atomic/privatization strategy — shared-memory atomics for small point counts (low overhead, no register pressure) and per-thread register privatization with warp-shuffle reduction for large point counts (no atomic contention). This eliminates the throughput dip that pure privatization caused at small N, while preserving scalability at large N.
Throughput plot (log-log scale)
Solid lines = 1x P100, dash-dot = 2x P100, dashed lines = CPU (Rust). All algorithms benefit from the GPU across the full range of point counts. LS reaches 20M pts/sec on 1x P100 at 8K points (100x over CPU). BLS reaches 5.5M pts/sec on 1x P100 (18x over CPU).
See the full benchmarks page for the full table, 2x P100 data, and methodology.
To reproduce, run python benchmarks/throughput_bench.py followed by python benchmarks/plot_throughput.py. Use sbatch benchmarks/run_bench.sh for multi-GPU benchmarks on a SLURM cluster.
Installing
GPU backend (CUDA)
Requires CUDA installed with nvcc on your PATH (or set $CUDA_HOME).
pip install cython numpy
pip install -e .
CPU backend (Rust)
Requires a Rust toolchain and maturin:
pip install maturin
cd rust && maturin develop --release
This builds the periodfind.cpu module using Rayon for multithreaded parallelism. No GPU needed.
Python API
Ensure that Cython and numpy are both installed. Then, simply run:
python setup.py install
And periodfind should be installed!
Testing
Run the full test suite with pytest:
pytest tests/ -v
Tests are organized into four categories:
- Unit tests (
test_periodfind.py): Statistics, Periodogram, and utility tests (no GPU or Rust needed) - CPU standalone tests (
test_cpu_standalone.py): Tests for the Rust CPU backend (period-finding algorithms) - Fourier tests (
test_fourier.py): Tests for Fourier decomposition (output shape, known signal recovery, edge cases, input validation) - GPU integration tests (
test_cpu_vs_cuda.py): CUDA algorithm tests (auto-skipped if no GPU is available)
To run only CPU tests (no GPU required):
pytest tests/test_periodfind.py tests/test_cpu_standalone.py tests/test_fourier.py -v
CI
GitHub Actions runs CPU tests automatically on every push and PR. See .github/workflows/tests.yml. GPU tests run on self-hosted runners when available.
Compatibility
This package has been tested only on Linux hosts running CUDA 10.2 and CUDA 11. Other operating systems and versions of CUDA may work, but it is not guaranteed.
Acknowledgements
Funding for this project was provided by the Larson Scholar Fellowship as part of the SURF program.
License
This package is licensed under the BSD 3-clause license. The copyright holder is the California Institute of Technology (Caltech).
setup.py and MANIFEST.in are based off of an example project at https://github.com/rmcgibbo/npcuda-example/, licensed under the BSD 2-clause license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file periodfind_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: periodfind_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 500.2 kB
- Tags: CPython 3.9, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83b62626455a9a85a55b9a02d9c5fa5352a2c136a965f08e6302dff02e82a8ec
|
|
| MD5 |
a460832075d3d2a1598130b1bc140746
|
|
| BLAKE2b-256 |
9da55c002bc141dbc4125e3f9fea230916b0780fe47259b1d824c19cdfa38e15
|