Skip to main content

Sanity checks on sequences/histograms using ML techniques.

Project description

PyPI GitHub release (latest by date including pre-releases) Versions PyPI - License GitHub issues GitHub pull requests

voka

Histograms comparisons using statistical tests as input to an outlier detection algorithm.

Problem Statement

Let's say you have a large number of histograms produced by a complex system (e.g. scientific simulation chain for a large-scale physics experiment) and you want to compare one large set of histograms to another to determine differences. When the number of histograms becomes large (>100) it can be difficult for human observers to efficiently scan them for subtle differences buried in statistical flucuations. The project is a tool that can help detect those differences.

This method can be viewed as emperically determining a p-value threshold from benchmark sets, valid for both discrete and continuous distributions, and both Poissonian and non-Poissonian statistics.

See the wiki for more details.

Dependencies

  • numpy
  • matplotlib
  • scipy (optional)
   numpy (basic_example,classic_fit_example,standard_distribution_comparisons,stochastic_example,test.test_lof,test.test_metrics,test.test_voka,vanilla_gaussian,voka.lof)
    pylab (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
    scipy 
      \-optimize (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
      \-special (voka.metrics.llh)
      \-stats (standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
    voka 
      \-compare (test.test_metrics)
      \-lof (test.test_lof)
      \-metrics 
      | \-ad (test.test_metrics)
      | \-bdm (test.test_metrics)
      | \-chisq (standard_distribution_comparisons,stochastic_example,test.test_metrics,vanilla_gaussian)
      | \-cvm (test.test_metrics)
      | \-ks (test.test_metrics)
      | \-llh (test.test_metrics)
      \-model (basic_example,test.test_voka)

Test Coverage

Measured with coverage.

As of January 14th, 2022:

Name                 Stmts   Miss  Cover   Missing
--------------------------------------------------
voka/__init__.py         0      0   100%
voka/compare.py         12      2    83%   37-38
voka/lof.py             26      0   100%
voka/metrics.py        115     17    85%   39-42, 60, 80, 89, 113, 141, 154, 162-163, 165-166, 168-169, 184
voka/model.py           36      6    83%   78-87
voka/two_sample.py      38     38     0%   2-90
--------------------------------------------------
TOTAL                  227     63    72%

Running Tests

$ python3 -m unittest
$ coverage run --source=voka -m unittest
$ coverage report -m

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icecube_voka-0.1.6.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

icecube_voka-0.1.6-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file icecube_voka-0.1.6.tar.gz.

File metadata

  • Download URL: icecube_voka-0.1.6.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for icecube_voka-0.1.6.tar.gz
Algorithm Hash digest
SHA256 5f538f6c234abf7142633809d6ddd4db9a8fd1908a3c4f545eaf21bfe492f6c6
MD5 8b3fcf92e401a7e30f63682a88144874
BLAKE2b-256 79f9c5ec0d98f9188a6feffa51ed1ba5cfdc288d3ca108259f90bbf3643deae4

See more details on using hashes here.

File details

Details for the file icecube_voka-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: icecube_voka-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for icecube_voka-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 29007d2c71ce50bd874846358b6f63b5feb5b54936d212547308f44919f1c3fc
MD5 6bef812769a1a3a18bc26bca85f431f9
BLAKE2b-256 a5b6bb736b36058bde713edf78829031dba0361bf26fc83dc7436224eac22f7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page