Skip to main content

Sanity checks on sequences/histograms using ML techniques.

Project description

PyPI GitHub release (latest by date including pre-releases) Versions PyPI - License GitHub issues GitHub pull requests

voka

Histograms comparisons using statistical tests as input to an outlier detection algorithm.

Problem Statement

Let's say you have a large number of histograms produced by a complex system (e.g. scientific simulation chain for a large-scale physics experiment) and you want to compare one large set of histograms to another to determine differences. When the number of histograms becomes large (>100) it can be difficult for human observers to efficiently scan them for subtle differences buried in statistical flucuations. The project is a tool that can help detect those differences.

This method can be viewed as emperically determining a p-value threshold from benchmark sets, valid for both discrete and continuous distributions, and both Poissonian and non-Poissonian statistics.

See the wiki for more details.

Dependencies

  • numpy
  • matplotlib
  • scipy (optional)
   numpy (basic_example,classic_fit_example,standard_distribution_comparisons,stochastic_example,test.test_lof,test.test_metrics,test.test_voka,vanilla_gaussian,voka.lof)
    pylab (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
    scipy 
      \-optimize (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
      \-special (voka.metrics.llh)
      \-stats (standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
    voka 
      \-compare (test.test_metrics)
      \-lof (test.test_lof)
      \-metrics 
      | \-ad (test.test_metrics)
      | \-bdm (test.test_metrics)
      | \-chisq (standard_distribution_comparisons,stochastic_example,test.test_metrics,vanilla_gaussian)
      | \-cvm (test.test_metrics)
      | \-ks (test.test_metrics)
      | \-llh (test.test_metrics)
      \-model (basic_example,test.test_voka)

Test Coverage

Measured with coverage.

As of January 14th, 2022:

Name                 Stmts   Miss  Cover   Missing
--------------------------------------------------
voka/__init__.py         0      0   100%
voka/compare.py         12      2    83%   37-38
voka/lof.py             26      0   100%
voka/metrics.py        115     17    85%   39-42, 60, 80, 89, 113, 141, 154, 162-163, 165-166, 168-169, 184
voka/model.py           36      6    83%   78-87
voka/two_sample.py      38     38     0%   2-90
--------------------------------------------------
TOTAL                  227     63    72%

Running Tests

$ python3 -m unittest
$ coverage run --source=voka -m unittest
$ coverage report -m

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icecube_voka-0.1.2.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

icecube_voka-0.1.2-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file icecube_voka-0.1.2.tar.gz.

File metadata

  • Download URL: icecube_voka-0.1.2.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.7

File hashes

Hashes for icecube_voka-0.1.2.tar.gz
Algorithm Hash digest
SHA256 56838f85c57de213b31e8678944e9193c96e6c8ce8a23accc49e59c3509677e5
MD5 7ab46978209f807bcd487cec4ca6cfcd
BLAKE2b-256 f3e0b291e3f4e33fc14eef108ca66b12e6733f62bd5b02884953a785885da84a

See more details on using hashes here.

File details

Details for the file icecube_voka-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: icecube_voka-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.7

File hashes

Hashes for icecube_voka-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31c18c77f55955de18bfddf22908dac65076e955b0624cf972c82c1c17b4fa70
MD5 bc9313c0f0e77c2c4337411b83bc9505
BLAKE2b-256 ea05e891bae2035b257ea8bc1f89be9016d3b5e1866253f9feaf5f6d201a27da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page