Skip to main content

Use this package to analyse your data with Benford's law

Project description

Benford's law analysis

Benford's law is a digit-law, which states that the distribution of seperate digits in numbers follow a specific frequency. This specific frequency is seen in many numerical datasets, as discovered by Simon Newcomb and Frank Benford. You can find on wikipedia more information about this mysterious law.

Benford's law might be helpful to detect fraud, do science, or just investigate the quality of data.

Installation

By pip install benfordslaw-analysis you will install the package.

Usage

Now you can do from benfordslaw_analysis.analysis import Analysis. You have now imported the class Analysis. Now you can play around with your data and test if Benford's law is hidden in your data, by inserting a list or a pandas series into the class object.

For example, make a plot with Benford's law versus random data with:

from benfordslaw_analysis.analysis import Analysis
from random import uniform
random_data = [uniform(-10, 10) for i in range(0,1000)]
bl = Analysis(random_data)
bl.plot_first_digit('Random stuff')

Test Image 1

Note that we use the Euclidean distance between the digit frequency from Benford's law and your own data as a measure and that we use Poisson error bars (based on the number of data points).

Euclidean distance

The normalized Euclidean distance is a nice way to test how Benford your data is. This value is situated between 0 and 1, the closer to 0 the better. However, it is not a formal statistic because it is sample size independent. In the literature there are several other measures (Chi-square, Kolmogorov-Smirnov, ..) that are used but I noticed in my own research that size dependency is a limitation in bigger datasets and classifies all bigger datasets as non-Benford, even though they are Benford by eye. More about the justification of using the Euclidean distance is explained in my own paper in Appendix D.

Citing

If you find benfordslaw_analysis a useful tool for your own research, please cite in the following way:

@misc{benford_py,
      author = {Jurjen, de Jong},
      title = {{benfordslaw_analysis: a Python Implementation of Benford's Law analysis}},
      year = {2021},
      howpublished = {\url{https://github.com/jurjen93/Benfords_law}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benfordslaw_analysis-1.0.3.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

benfordslaw_analysis-1.0.3-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file benfordslaw_analysis-1.0.3.tar.gz.

File metadata

  • Download URL: benfordslaw_analysis-1.0.3.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5

File hashes

Hashes for benfordslaw_analysis-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f1a878ac67110edec59abb53015c376218398bfd92d666930eb62dfba843929f
MD5 ee98ffcf8968a96125da59595e72bc8b
BLAKE2b-256 167a651a1a19da55e00823cdca5e1e09fb281bfa3fff06b23b25a64f3ead8f8d

See more details on using hashes here.

File details

Details for the file benfordslaw_analysis-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: benfordslaw_analysis-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5

File hashes

Hashes for benfordslaw_analysis-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e6d6267a9f7b84a13ea0a8bed6083a1891d4d27ddb18149b3c8cfc1753140684
MD5 d8c8c81b92888ae68dbc7afcefab3f3b
BLAKE2b-256 6ec4eb8b57d499d4c28aaa56932aff6970145513e9a5ca4355abe594c7ef6e1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page