Skip to main content

Package for computing 2-point statistics (correlation functions and structure functions)

Project description

pairstat's GitHub Actions CI Status pre-commit Ruff

Documentation | Installation | Contributing | Getting Help

pairstat is a python package that provides accelerated/parallelized routines for computing spatial 2-point statistics from spatial data data (e.g. 2-point correlation function, structure functions).

The pairstat package was formerly known as pyvsf

Motivation

2-point statistics are important for characterizing the properties of turbulence (2-point statistics comes up in other contexts like cosmology). There hasn’t been an easy-to-use package for computing these quantities, until now.

The pairstat package is most useful for datasets where Fourier methods are problematic (e.g. you don’t have a regularly spaced periodic grid). Before developing pairstat, I performed similar calculations by processing the outputs of scipy.spatial.distance.pdist and scipy.spatial.distance.cdist functions. This package implements equivalent functionality that uses more specialized C++ code in order to perform the calculation faster and with far less memory. [1] It also supports parallelization (more on that below).

Installation

As long as you have a C++ compiler, the easiest way to get the package is by invoking

$ python -m pip install pairstat

The package is automatically compiled with OpenMP support if the compiler supports it. To confirm that pairstat was compiled with OpenMP support, you can check whether the output from the following command mentions OpenMP:

$ python -m pairstat

See our Installation Guide for more details (especially if the package wasn’t compiled with OpenMP support).

Key-Features: Parallelism and Scalability

The key feature of this package is the support for parallelism. If a compatible compiler is used to build this package, it will automatically be built with OpenMP support for parallelizing calculations of structure functions and correlation functions.

Undocumented machinery also exists to help use this functionality to parallelize calculations across machines on a computing cluster (e.g. with MPI). We plan to document this machinery in the near future.

The other important feature, is memory usage. The memory usage is independent of the number of points. A naive implementation of equivalent calculation using scipy functionality has memory usage that scales with the number of pairs of points (i.e. the number of points squared for auto-correlation). In other words, this function is far more scalable that the alternative.

Current Status

We are planning to replace the C++ and Cython logic with the rust logic before the 1.0 release. This rewrite will allow us to significantly improve the code quality.

Contributions and Feature requests are welcome!

License

pairstat is dual-licensed under either the MIT license and the Apache License (Version 2.0).

Footnotes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pairstat-0.3.0.tar.gz (344.2 kB view details)

Uploaded Source

File details

Details for the file pairstat-0.3.0.tar.gz.

File metadata

  • Download URL: pairstat-0.3.0.tar.gz
  • Upload date:
  • Size: 344.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pairstat-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c6d6be090a7bf3bca72ca3b2db8bf1f15e8789b08d1494051da37b3cc4b4cf78
MD5 29df4511a0d867c408cb0a2b025bfa51
BLAKE2b-256 eecd1230ad8fea94e40262956a30d26f3adb9fd4624ec003c30e51297a530a87

See more details on using hashes here.

Provenance

The following attestation bundles were made for pairstat-0.3.0.tar.gz:

Publisher: cd.yml on mabruzzo/pairstat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page