Use this package to analyse your data with Benford's law
Project description
Benford's law analysis
Benford's law is a digit-law, which states that the distribution of seperate digits in numbers follow a specific frequency. This specific frequency is seen in many numerical datasets, as discovered by Simon Newcomb and Frank Benford. You can find on wikipedia more information about this mysterious law.
Benford's law might be helpful to detect fraud, do science, or just investigate the quality of data.
Installation
By pip install benfordslaw-analysis
you will install the package.
Usage
Now you can do from benfordslaw_analysis.analysis import Analysis
.
You have now imported the class Analysis
.
Now you can play around with your data and test if Benford's law is hidden in your data, by inserting a list or
a pandas series into the class object.
For example, make a plot with Benford's law versus random data with:
from benfordslaw_analysis.analysis import Analysis
from random import uniform
random_data = [uniform(-10, 10) for i in range(0,1000)]
bl = Analysis(random_data)
bl.plot_first_digit('Random stuff')
Note that we use the Euclidean distance between the digit frequency from Benford's law and your own data as a measure and that we use Poisson error bars (based on the number of data points).
Euclidean distance
The normalized Euclidean distance is a nice way to test how Benford your data is. This value is situated between 0 and 1, the closer to 0 the better. However, it is not a formal statistic because it is sample size independent. In the literature there are several other measures (Chi-square, Kolmogorov-Smirnov, ..) that are used but I noticed in my own research that size dependency is a limitation in bigger datasets and classifies all bigger datasets as non-Benford, even though they are Benford by eye. More about the justification of using the Euclidean distance is explained in my own paper in Appendix D.
Citing
If you find benfordslaw_analysis
a useful tool for your own research, please cite in the following way:
@misc{benford_py,
author = {Jurjen, de Jong},
title = {{benfordslaw_analysis: a Python Implementation of Benford's Law analysis}},
year = {2021},
howpublished = {\url{https://github.com/jurjen93/Benfords_law}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for benfordslaw_analysis-1.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1a878ac67110edec59abb53015c376218398bfd92d666930eb62dfba843929f |
|
MD5 | ee98ffcf8968a96125da59595e72bc8b |
|
BLAKE2b-256 | 167a651a1a19da55e00823cdca5e1e09fb281bfa3fff06b23b25a64f3ead8f8d |
Hashes for benfordslaw_analysis-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6d6267a9f7b84a13ea0a8bed6083a1891d4d27ddb18149b3c8cfc1753140684 |
|
MD5 | d8c8c81b92888ae68dbc7afcefab3f3b |
|
BLAKE2b-256 | 6ec4eb8b57d499d4c28aaa56932aff6970145513e9a5ca4355abe594c7ef6e1e |