Skip to main content

Two-sample L-test: Shift-Invariant Cramér–von Mises Variant

Project description

ltest — Shift-invariant two-sample CvM (the L-test)

Two-sample L-test (shift-invariant Cramér–von Mises variant).

Performs the two-sample L-test. The L-test is a shift-invariant modification of the Cramér–von Mises (CvM) two-sample test [1] that minimizes the integral squared difference between two empirical CDFs (L_2 squared distance), called U, by optimizing a scalar location shift, s, between samples. For independent samples X = {X_i}{i=1..n} and Y = {Y_j}{j=1..m}, the null hypothesis is:

H0: ∃ s ∈ ℝ such that F_X(t) = F_Y(t + s) for all t

i.e., samples X and Y are draws from the same (unspecified) continuous distribution up to a location difference.

It returns a Monte-Carlo p-value, its uncertainty, a shift estimate, its uncertainty, and the minimized statistic.

Motivation

Originally developed for ultra–high-energy cosmic-ray (UHECR) composition work comparing X_max distributions to model predictions, where model means have larger uncertainty than higher moments (such that there is significant CI overlap) [2]. The L-test is useful whenever relative location may be biased (instrument/location/seasonal effects, unknown inter-experiment offsets), and/or shape differences (variance, skew, tails, multimodality) are of primary interest. See Appendix B (“L-test”) of the author’s Ph.D. thesis for background and derivations [3].

Install

#Prerequisets
pip install numpy scipy

#Install ltest from pip
pip install ltest-shift

# or local install from source
pip install -e .

Usage

import numpy as np
from ltest import ltest

rng = np.random.default_rng(0)
x = rng.normal(size=200)
y = rng.normal(loc=0.3, scale=1.2, size=220)

l_p, l_p_err, l_shift, shift_boot, shift_err, l_stat = ltest(
    x, y, B=1000, tol_p=0.05, tol_s=0.05, workers=None, brute=False
)
print(l_p, l_p_err, l_shift, shift_err)

Notes

  • The L statistic equals the minimized version of Eq. (9) in Anderson [6] when including a free location parameter. Minimization is performed numerically (scalar search).
  • The L-shift is generally interpretable as a location offset only when the L-test fails to reject (i.e., shapes appear compatible) or the two parent distributions are symmetrical. Under shape mismatch, it is a nuisance alignment chosen to minimize the ECDF distance and should not be interpreted as a population location difference.
  • Parallel bootstrap with early stopping by relative error on p or on the shift uncertainty.
  • Optional “brute” search of s via rank-change breakpoints (slower).
  • See examples/ for Type I/II power and shift-accuracy experiments.
  • Run tests via tests/test_basic.py. Uses pytest -q.
  • On Windows/macOS, protect the entry point when using multiprocessing:
    if __name__ == "__main__":
        # call ltest(...)
    

This is new code based upon the statistical distribution test proposed in the author's 2017 Ph.D. thesis [3] and was used for the conference paper 'Study of UHECR Composition Using Telescope Array’s Middle Drum Detector and Surface Array in Hybrid Mode', 34th ICRC, 2016. Additionally, the paper The Astrophysical Journal, 858:76 (27pp), 2018 May 10 used a similar principle.

References

[1] https://en.wikipedia.org/wiki/Cramer-von_Mises_criterion
[2] Abbasi, R. U., Thomson, G. B., uncertainty from extrapolation of cosmic ray air shower parameters arXiv:1605.05241
[3] Lundquist, J.P. Energy Anisotropies of Proton-Like Ultra-High Energy Cosmic Rays, Ph.D. Thesis, University of Utah, 2017
[4] Pebay, P.P. (2008) Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments, Sandia National Laboratories Technical Report
[5] Scholz, F. W and Stephens, M. A. (1987), K-Sample Anderson-Darling Tests, Journal of the American Statistical Association, Vol. 82, pp. 918-924
[6] Anderson, T.W., On the distribution of the two-sampleCramer-von-Mises criterion. The Annals of Mathematical Statistics, pp. 1148-1159

Dependencies

  • Python ≥ 3.8
  • NumPy ≥ 1.23
  • SciPy ≥ 1.9

License

This project is licensed under the MIT license.
See the full text in LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ltest_shift-0.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ltest_shift-0.1.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file ltest_shift-0.1.0.tar.gz.

File metadata

  • Download URL: ltest_shift-0.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ltest_shift-0.1.0.tar.gz
Algorithm Hash digest
SHA256 28dd8fe3ac460caac60ae82ac0f0bde646112e96b67d158791a435b031e2dcc0
MD5 c5f72fc451ab01645053855fadda022b
BLAKE2b-256 733afdd2f66530b409e529b86330c61aac6e37607916caad6290914407a3b098

See more details on using hashes here.

File details

Details for the file ltest_shift-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ltest_shift-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ltest_shift-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c444020ef3af2226cc104b343853dc0c78b6f18a292761f163decdbc8554d8a
MD5 d340c8c4d1168cc1d1d17a7927688b25
BLAKE2b-256 4905161bf1ca25fb06f12ee2419cac374c440277df2202dd9254b0fff92be915

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page