Skip to main content

GPU-accelerated period finding utilities

Project description

PeriodFind

    A collection of CUDA-accelerated periodicity detection algorithms, with both C++ and Python APIs. Includes a Rust-based CPU backend for environments without GPU hardware.
    
    ## Algorithms
    
    ### Period-Finding
    
    | Algorithm | Unified API | GPU (CUDA) | CPU (Rust) |
    |-----------|-------------|-----------|------------|
    | Conditional Entropy | `periodfind.ConditionalEntropy` | `periodfind.gpu.ConditionalEntropy` | `periodfind.cpu.ConditionalEntropy` |
    | Analysis of Variance | `periodfind.AOV` | `periodfind.gpu.AOV` | `periodfind.cpu.AOV` |
    | Lomb-Scargle | `periodfind.LombScargle` | `periodfind.gpu.LombScargle` | `periodfind.cpu.LombScargle` |
    | Fast Phase-folding Weighted | `periodfind.FPW` | `periodfind.gpu.FPW` | `periodfind.cpu.FPW` |
    | Box Least Squares | `periodfind.BoxLeastSquares` | `periodfind.gpu.BoxLeastSquares` | `periodfind.cpu.BoxLeastSquares` |
    
    ### Feature Extraction
    
    | Algorithm | Unified API | CPU (Rust) |
    |-----------|-------------|------------|
    | Fourier Decomposition | `periodfind.FourierDecomposition` | `periodfind.cpu.FourierDecomposition` |
    
    Fourier decomposition computes weighted linear least-squares Fourier fits with BIC model selection (0-5 harmonics) for a batch of light curves given pre-determined periods. Returns 14 features per curve: `[power, BIC, offset, slope, A1, B1, A2, B2, A3, B3, A4, B4, A5, B5]`. This replaces the per-source `scipy.optimize.curve_fit` approach with a direct Cholesky solve, giving identical results orders of magnitude faster.
    
    ## Device API
    
    Periodfind provides a PyTorch-style device abstraction so you can write device-agnostic code. When no device is set, it auto-detects GPU availability (tries to import the CUDA extensions and runs `nvidia-smi`).
    
    ```python
    import periodfind
    
    # Set the global default device
    periodfind.set_device('cpu')   # or 'gpu'
    print(periodfind.get_device()) # 'cpu'
    
    # Factory functions dispatch to the right backend
    ce  = periodfind.ConditionalEntropy(n_phase=10, n_mag=10)
    aov = periodfind.AOV(n_phase=15)
    ls  = periodfind.LombScargle()
    fpw = periodfind.FPW(n_bins=10)
    bls = periodfind.BoxLeastSquares(n_bins=50, qmin=0.01, qmax=0.5)
    fd  = periodfind.FourierDecomposition()  # CPU-only for now
    
    # Per-call override (ignores the global default)
    ce_gpu = periodfind.ConditionalEntropy(n_phase=10, n_mag=10, device='gpu')
    ```
    
    You can still import backends directly:
    
    ```python
    from periodfind.gpu import ConditionalEntropy  # CUDA backend
    from periodfind.cpu import ConditionalEntropy  # Rust CPU backend
    from periodfind.cpu import FourierDecomposition  # Rust CPU only
    ```
    
    ### Box Least Squares Usage
    
    BLS searches for periodic box-shaped (flat-bottom) transit dips in time-series data ([Kovacs, Zucker & Mazeh 2002](https://ui.adsabs.harvard.edu/abs/2002A%26A...391..369K)). It is particularly well-suited for detecting eclipsing binaries and transiting exoplanets.
    
    ```python
    import numpy as np
    import periodfind
    
    bls = periodfind.BoxLeastSquares(
        n_bins=50,     # number of phase bins
        qmin=0.01,     # minimum transit duration (fraction of period)
        qmax=0.5,      # maximum transit duration (fraction of period)
    )
    
    # times, mags: lists of float32 arrays (one per light curve)
    # errs: optional list of float32 uncertainty arrays
    periods = np.linspace(0.5, 10.0, 5000, dtype=np.float32)
    period_dts = np.array([0.0], dtype=np.float32)
    
    # Get best-period statistics
    stats = bls.calc(times, mags, periods, period_dts, errs=errs, output="stats")
    print(stats[0].params[0])  # detected period
    
    # Get full periodogram
    pgrams = bls.calc(times, mags, periods, period_dts, output="periodogram")
    
    # Get top-N peaks (memory-efficient for large grids)
    peaks = bls.calc(times, mags, periods, period_dts, output="peaks", n_peaks=32)
    ```
    
    ### Fourier Decomposition Usage
    
    ```python
    import numpy as np
    import periodfind
    
    fd = periodfind.FourierDecomposition()
    
    # times, mags, errs: lists of float32 arrays (one per light curve)
    # periods: float32 array with one period per curve
    features = fd.calc(times, mags, errs, periods)
    # features.shape == (n_curves, 14)
    ```
    
    ## Throughput Benchmarks
    
    Measured on a batch of **100 light curves** over **1000 trial periods** (single `period_dt`). CPU = Rust/Rayon (28 cores, Skylake Xeon); GPU = NVIDIA Tesla P100 (12 GB). Times are median of 3 runs after warmup.
    
    ### Throughput table (points/sec)
    
    | pts/curve | Backend | CE | AOV | LS | FPW | BLS |
    |----------:|---------|---:|----:|---:|----:|----:|
    | 256 | CPU | 140K | 184K | 146K | 245K | 121K |
    | 256 | 1x P100 | 1.1M | 1.1M | 1.2M | 1.1M | 1.0M |
    | 256 | 2x P100 | 1.1M | 1.2M | 1.4M | 1.2M | 1.2M |
    | 1024 | CPU | 176K | 211K | 181K | 290K | 228K |
    | 1024 | 1x P100 | 3.8M | 3.1M | 4.5M | 2.7M | 3.2M |
    | 1024 | 2x P100 | 4.1M | 3.9M | 5.1M | 3.6M | 4.1M |
    | 4096 | CPU | 185K | 217K | 194K | 307K | 293K |
    | 4096 | 1x P100 | 9.8M | 3.2M | 13.2M | 3.5M | 6.2M |
    | 4096 | 2x P100 | 12.7M | 5.6M | 16.5M | 6.1M | 9.6M |
    | 8192 | CPU | 186K | 219K | 199K | 309K | 307K |
    | 8192 | 1x P100 | 13.7M | 3.7M | 19.8M | 5.6M | 5.5M |
    | 8192 | 2x P100 | 19.6M | 6.8M | 27.6M | 9.9M | 9.8M |
    
    GPU kernels use a **hybrid atomic/privatization strategy** — shared-memory atomics for small point counts (low overhead, no register pressure) and per-thread register privatization with warp-shuffle reduction for large point counts (no atomic contention). This eliminates the throughput dip that pure privatization caused at small N, while preserving scalability at large N.
    
    ### Throughput plot (log-log scale)
    
    ![Throughput benchmark](docs/throughput_points.png)
    
    Solid lines = 1x P100, dash-dot = 2x P100, dashed lines = CPU (Rust). All algorithms benefit from the GPU across the full range of point counts. LS reaches 20M pts/sec on 1x P100 at 8K points (100x over CPU). BLS reaches 5.5M pts/sec on 1x P100 (18x over CPU).
    
    See the [full benchmarks page](https://zwickytransientfacility.github.io/periodfind/benchmarks/) for the full table, 2x P100 data, and methodology.
    
    To reproduce, run `python benchmarks/throughput_bench.py` followed by `python benchmarks/plot_throughput.py`. Use `sbatch benchmarks/run_bench.sh` for multi-GPU benchmarks on a SLURM cluster.
    
    ## Installing
    
    ### GPU backend (CUDA)
    
    Requires CUDA installed with `nvcc` on your `PATH` (or set `$CUDA_HOME`).
    
    ```bash
    pip install cython numpy
    pip install -e .
    ```
    
    ### CPU backend (Rust)
    
    Requires a Rust toolchain and [maturin](https://github.com/PyO3/maturin):
    
    ```bash
    pip install maturin
    cd rust && maturin develop --release
    ```
    
    This builds the `periodfind.cpu` module using Rayon for multithreaded parallelism. No GPU needed.
    
    ### Python API
    
    Ensure that `Cython` and `numpy` are both installed. Then, simply run:
    
    ```bash
    python setup.py install
    ```
    
    And periodfind should be installed!
    
    ## Testing
    
    Run the full test suite with pytest:
    
    ```bash
    pytest tests/ -v
    ```
    
    Tests are organized into four categories:
    
    - **Unit tests** (`test_periodfind.py`): Statistics, Periodogram, and utility tests (no GPU or Rust needed)
    - **CPU standalone tests** (`test_cpu_standalone.py`): Tests for the Rust CPU backend (period-finding algorithms)
    - **Fourier tests** (`test_fourier.py`): Tests for Fourier decomposition (output shape, known signal recovery, edge cases, input validation)
    - **GPU integration tests** (`test_cpu_vs_cuda.py`): CUDA algorithm tests (auto-skipped if no GPU is available)
    
    To run only CPU tests (no GPU required):
    
    ```bash
    pytest tests/test_periodfind.py tests/test_cpu_standalone.py tests/test_fourier.py -v
    ```
    
    ## CI
    
    GitHub Actions runs CPU tests automatically on every push and PR. See `.github/workflows/tests.yml`. GPU tests run on self-hosted runners when available.
    
    ## Compatibility
    
    This package has been tested only on Linux hosts running CUDA 10.2 and CUDA 11. Other operating systems and versions of CUDA may work, but it is not guaranteed.
    
    ## Acknowledgements
    
    Funding for this project was provided by the Larson Scholar Fellowship as part of the SURF program.
    
    ## License
    
    This package is licensed under the BSD 3-clause license. The copyright holder is the California Institute of Technology (Caltech).
    
    `setup.py` and `MANIFEST.in` are based off of an example project at <https://github.com/rmcgibbo/npcuda-example/>, licensed under the BSD 2-clause license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

periodfind-0.1.1.tar.gz (917.5 kB view details)

Uploaded Source

File details

Details for the file periodfind-0.1.1.tar.gz.

File metadata

  • Download URL: periodfind-0.1.1.tar.gz
  • Upload date:
  • Size: 917.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for periodfind-0.1.1.tar.gz
Algorithm Hash digest
SHA256 90e9e5894ced57ca64994b62734d8551e2bdf4d7fb49fd66135fab0feb5d8e85
MD5 dfee153a892725bac6c3ce8d3389f4c5
BLAKE2b-256 572d6388584ad206320f4aa58bff980b82861bd92fd443d1c78fafef7d9a8397

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page