Sanity checks on sequences/histograms using ML techniques.
Project description
voka
Histograms comparisons using statistical tests as input to an outlier detection algorithm.
Problem Statement
Let's say you have a large number of histograms produced by a complex system (e.g. scientific simulation chain for a large-scale physics experiment) and you want to compare one large set of histograms to another to determine differences. When the number of histograms becomes large (>100) it can be difficult for human observers to efficiently scan them for subtle differences buried in statistical flucuations. The project is a tool that can help detect those differences.
This method can be viewed as emperically determining a p-value threshold from benchmark sets, valid for both discrete and continuous distributions, and both Poissonian and non-Poissonian statistics.
See the wiki for more details.
Dependencies
- numpy
- matplotlib
- scipy (optional)
numpy (basic_example,classic_fit_example,standard_distribution_comparisons,stochastic_example,test.test_lof,test.test_metrics,test.test_voka,vanilla_gaussian,voka.lof)
pylab (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
scipy
\-optimize (classic_fit_example,standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
\-special (voka.metrics.llh)
\-stats (standard_distribution_comparisons,stochastic_example,vanilla_gaussian)
voka
\-compare (test.test_metrics)
\-lof (test.test_lof)
\-metrics
| \-ad (test.test_metrics)
| \-bdm (test.test_metrics)
| \-chisq (standard_distribution_comparisons,stochastic_example,test.test_metrics,vanilla_gaussian)
| \-cvm (test.test_metrics)
| \-ks (test.test_metrics)
| \-llh (test.test_metrics)
\-model (basic_example,test.test_voka)
Test Coverage
Measured with coverage.
As of January 14th, 2022:
Name Stmts Miss Cover Missing
--------------------------------------------------
voka/__init__.py 0 0 100%
voka/compare.py 12 2 83% 37-38
voka/lof.py 26 0 100%
voka/metrics.py 115 17 85% 39-42, 60, 80, 89, 113, 141, 154, 162-163, 165-166, 168-169, 184
voka/model.py 36 6 83% 78-87
voka/two_sample.py 38 38 0% 2-90
--------------------------------------------------
TOTAL 227 63 72%
Running Tests
$ python3 -m unittest
$ coverage run --source=voka -m unittest
$ coverage report -m
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file icecube_voka-0.1.2.tar.gz
.
File metadata
- Download URL: icecube_voka-0.1.2.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56838f85c57de213b31e8678944e9193c96e6c8ce8a23accc49e59c3509677e5 |
|
MD5 | 7ab46978209f807bcd487cec4ca6cfcd |
|
BLAKE2b-256 | f3e0b291e3f4e33fc14eef108ca66b12e6733f62bd5b02884953a785885da84a |
File details
Details for the file icecube_voka-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: icecube_voka-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31c18c77f55955de18bfddf22908dac65076e955b0624cf972c82c1c17b4fa70 |
|
MD5 | bc9313c0f0e77c2c4337411b83bc9505 |
|
BLAKE2b-256 | ea05e891bae2035b257ea8bc1f89be9016d3b5e1866253f9feaf5f6d201a27da |