Skip to main content

Evaluate the Goodness-of-Fit(GOF) for binned or unbinned data.

Project description

GOFevaluation

Evaluate the Goodness-of-Fit (GOF) for binned or unbinned data. Test package Binder PyPI version shields.io CodeFactor Coverage Status pre-commit.ci status DOI

This GOF suite comprises the possibility to calculate different 1D / nD, binned / two-sample (unbinned) GOF measures and the corresponding approximate p-value. A list of implemented measures is given below.

Implemented GOF measures

GOF measure Class data input reference input dim
Kolmogorov-Smirnov KSTestGOF sample binned 1D
Two-Sample Kolmogorov-Smirnov KSTestTwoSampleGOF sample sample 1D
Two-Sample Anderson-Darling ADTestTwoSampleGOF sample sample 1D
Poisson Chi2 BinnedPoissonChi2GOF binned / sample binned nD
Chi2 BinnedChi2GOF binned / sample binned nD
Point-to-point PointToPointGOF sample sample nD

Installation and Set-Up

Regular installation:

pip install GOFevaluation

Developer setup:

Clone the repository:

git clone https://github.com/XENONnT/GOFevaluation
cd GOFevaluation

Install the requirements in your environment:

pip install -r requirements.txt

Then install the package:

python setup.py install --user

You are now good to go!

Usage

The best way to start with the GOFevaluation package is to have a look at the tutorial notebook. If you click on the mybinder badge, you can execute the interactive notebook and give it a try yourself without the need of a local installation.

Individual GOF Measures

Depending on your data and reference input you can initialise a gof_object in one of the following ways:

import GOFevaluation as ge

# Data Sample + Binned PDF
gof_object = ge.BinnedPoissonChi2GOF(data_sample, pdf, bin_edges, nevents_expected)

# Binned Data + Binned PDF
gof_object = ge.BinnedPoissonChi2GOF.from_binned(binned_data, binned_reference)

# Data Sample + Reference Sample
gof_object = ge.PointToPointGOF(data_sample, reference_sample)

With any gof_object you can calculate the GOF and the corresponding p-value as follows:

gof = gof_object.get_gof()
p_value = gof_object.get_pvalue()

Multiple GOF Measures at once

You can compute GOF and p-values for multiple measures at once with the GOFTest class.

Example:

import GOFevaluation as ge
import scipy.stats as sps

# random_state makes sure the gof values are reproducible.
# For the p-values, a slight variation is expected due to
# the random re-sampling method that is used.
data_sample = sps.uniform.rvs(size=100, random_state=200)
reference_sample = sps.uniform.rvs(size=300, random_state=201)

# Initialise all two-sample GOF measures:
gof_object = ge.GOFTest(data_sample=data_sample,
                        reference_sample=reference_sample,
                        gof_list=['ADTestTwoSampleGOF',
                                  'KSTestTwoSampleGOF',
                                  'PointToPointGOF'])
# Calculate GOFs and p-values:
d_min = 0.01
gof_object.get_gofs(d_min=d_min)
# OUTPUT:
# OrderedDict([('ADTestTwoSampleGOF', 1.6301454042304904),
#              ('KSTestTwoSampleGOF', 0.14),
#              ('PointToPointGOF', -0.7324060759792504)])

gof_object.get_pvalues(d_min=d_min)
# OUTPUT:
# OrderedDict([('ADTestTwoSampleGOF', 0.08699999999999997),
#              ('KSTestTwoSampleGOF', 0.10699999999999998),
#              ('PointToPointGOF', 0.31200000000000006)])

# Re-calculate p-value only for one measure:
gof_object.get_pvalues(d_min=.001, gof_list=['PointToPointGOF'])
# OUTPUT:
# OrderedDict([('ADTestTwoSampleGOF', 0.08699999999999997),
#              ('KSTestTwoSampleGOF', 0.10699999999999998),
#              ('PointToPointGOF', 0.128)])

print(gof_object)
# OUTPUT:
# GOFevaluation.gof_test
# GOF measures: ADTestTwoSampleGOF, KSTestTwoSampleGOF, PointToPointGOF


# ADTestTwoSampleGOF
# gof = 1.6301454042304904
# p-value = 0.08499999999999996

# KSTestTwoSampleGOF
# gof = 0.13999999999999996
# p-value = 0.09799999999999998

# PointToPointGOF
# gof = -0.7324060759792504
# p-value = 0.128

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

v0.1.6

v0.1.5

v0.1.4

v0.1.3

  • Should throw error when count_density mode with x or y limit specified by @dachengx in #46
  • Fix location of sklearn's DistanceMetric by @dachengx in #48

v0.1.2

  • Add colorbar switch, set 2D histogram x&y limit by @dachengx in #39
  • Some plotting bug fixes by @hoetzsch in #41
  • Homemade equiprobable_binning, still based on ECDF by @dachengx in #43
  • a few patches by @hammannr in #38
  • Exercise notebook by @hammannr in #44

v0.1.1

  • Add an example notebook that can be used as a guide when using the package for the first time (#29)
  • Improve and extend plotting of equiprobable binnings. This adds the option of plotting the binwise count density (#35)

v0.1.0

  • Multiple GOF tests (binned and unbinned) can be performed (#1, #5, #10, #12, #13)
  • The p-value is calculated based on toy sampling from the reference or a permutation test (#2, #14)
  • A wrapper class makes it convenient to perform multiple GOF tests in parallel (#19)
  • An equiprobable binning algorithm is implemented. The binning can be applied upon initialisation of the GOF object and a few visualization tools are provided. (#25, #26)
  • CI workflow implemented (#7)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GOFevaluation-0.1.6.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

GOFevaluation-0.1.6-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file GOFevaluation-0.1.6.tar.gz.

File metadata

  • Download URL: GOFevaluation-0.1.6.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for GOFevaluation-0.1.6.tar.gz
Algorithm Hash digest
SHA256 ab207b194d757ab3609938e197804fc4055de6feaee8a006e70c968c4f536949
MD5 3d5c825b61983a8169e8b28a09358e07
BLAKE2b-256 5521a77439103da7004ac1f88ba0fbbdbf7317122b812c14d9ddc1de4839716a

See more details on using hashes here.

File details

Details for the file GOFevaluation-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: GOFevaluation-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for GOFevaluation-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9d9f6e4ea894c01a3a7637498bf55db704cfe302369626d9acf0c0fe36cde35f
MD5 982e24decc4d37b564079b35e98149a3
BLAKE2b-256 6e818241d149ecaf77d36b5e3f04fa1d3ff808f03a4b21677aabd636db2c3719

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page