Skip to main content

Sanity checks on sequences/histograms using ML techniques.

Project description

PyPI GitHub release (latest by date including pre-releases) Versions PyPI - License GitHub issues GitHub pull requests

voka

Histograms comparisons using statistical tests as input to an outlier detection algorithm.

Problem Statement

Let's say you have a large number of histograms produced by a complex system (e.g. scientific simulation chain for a large-scale physics experiment) and you want to compare one large set of histograms to another to determine differences. When the number of histograms becomes large (>100) it can be difficult for human observers to efficiently scan them for subtle differences buried in statistical flucuations. The project is a tool that can help detect those differences.

This method can be viewed as emperically determining a p-value threshold from benchmark sets, valid for both discrete and continuous distributions, and both Poissonian and non-Poissonian statistics.

See the wiki for more details.

Dependencies

  • numpy
  • matplotlib
  • scipy (optional)
   numpy (basic_example,classic_fit_example,standard_distribution_comparisons,stochastic_example,test.test_lof,test.test_metrics,test.test_voka,vanilla_gaussian,voka.lof)
    pylab (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
    scipy 
      \-optimize (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
      \-special (voka.metrics.llh)
      \-stats (standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
    voka 
      \-compare (test.test_metrics)
      \-lof (test.test_lof)
      \-metrics 
      | \-ad (test.test_metrics)
      | \-bdm (test.test_metrics)
      | \-chisq (standard_distribution_comparisons,stochastic_example,test.test_metrics,vanilla_gaussian)
      | \-cvm (test.test_metrics)
      | \-ks (test.test_metrics)
      | \-llh (test.test_metrics)
      \-model (basic_example,test.test_voka)

Test Coverage

Measured with coverage.

As of January 14th, 2022:

Name                 Stmts   Miss  Cover   Missing
--------------------------------------------------
voka/__init__.py         0      0   100%
voka/compare.py         12      2    83%   37-38
voka/lof.py             26      0   100%
voka/metrics.py        115     17    85%   39-42, 60, 80, 89, 113, 141, 154, 162-163, 165-166, 168-169, 184
voka/model.py           36      6    83%   78-87
voka/two_sample.py      38     38     0%   2-90
--------------------------------------------------
TOTAL                  227     63    72%

Running Tests

$ python3 -m unittest
$ coverage run --source=voka -m unittest
$ coverage report -m

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icecube_voka-0.1.4.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

icecube_voka-0.1.4-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file icecube_voka-0.1.4.tar.gz.

File metadata

  • Download URL: icecube_voka-0.1.4.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for icecube_voka-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5472e3e84d28f22aec1f2ac727ca32f7184c1136ddff45c5e0a538832be57b5e
MD5 39dcaea9a8a4183d1f30de45b05c63bf
BLAKE2b-256 76af08193a035df834f9e5048bc9de7c4d3aff7232ecf73adec6a2fc798e86a9

See more details on using hashes here.

File details

Details for the file icecube_voka-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: icecube_voka-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for icecube_voka-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5d339df85dc12f485dfe5c50e6080de2104c8a5fb7db4fcb9a4c4f4d91cb48f4
MD5 9cf87fa1050dddb0be33e4daa4431bbc
BLAKE2b-256 149f922c2ca28fb6b1da854436c39e01ddd95da563e471857c4d13fb9fc08512

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page