Skip to main content

Use this package to analyse your data with Benford's law

Project description

Benford's law analysis

Benford's law is a digit-law, which states that digits from numbers follow a specific frequency. This specific frequency has been observed in many numerical datasets, as discovered by Simon Newcomb and Frank Benford. You can find on wikipedia more information about this mysterious law.

Benford's law is a helpful tool to detect fraud, do science, or just investigate the quality of data. You can also read my blog on Towards Data Science with a summary of Benford's law and this paper, where I used Benford's law to study digit patterns of the distances between stars in our Milky Way.

Installation

With pip install benfordslaw-analysis you can install the package.

Usage

After installing, you can run in Python from benfordslaw_analysis.analysis import Analysis. This imports the class Analysis. With this class you can verify if Benford's law is hidden in your own data.

For example, make a plot with Benford's law versus random data with:

from benfordslaw_analysis.analysis import Analysis
from random import uniform
random_data = [uniform(-10, 10) for i in range(0,1000)]
bl = Analysis(random_data)
bl.plot_first_digit('Random stuff')

Test Image 1

Note that we use the Euclidean distance between the digit frequency from Benford's law and your own data as a measure and that we use Poisson error bars (based on the number of data points).

Euclidean distance

The normalized Euclidean distance is a quick way to test whether your data follows Benford law. This value is situated between 0 and 1, the closer to 0 the better. However, it is not a formal statistic because it is sample size independent. In the literature there are several other measures (Chi-square, Kolmogorov-Smirnov, ..) that are used. However, I noticed in my own research that size dependency is a limitation in bigger datasets and classifies all bigger datasets as non-Benford, even though they are Benford by eye.

Citing

If you find benfordslaw_analysis a useful tool for your own research, please cite in the following way:

@misc{benford_py,
      author = {Jurjen, de Jong},
      title = {{benfordslaw_analysis: a Python Implementation of Benford's Law analysis}},
      year = {2021},
      howpublished = {\url{https://github.com/jurjen93/Benfords_law}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benfordslaw_analysis-1.1.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

benfordslaw_analysis-1.1.0-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file benfordslaw_analysis-1.1.0.tar.gz.

File metadata

  • Download URL: benfordslaw_analysis-1.1.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/65.6.3 requests-toolbelt/0.8.0 tqdm/4.66.1 CPython/3.8.10

File hashes

Hashes for benfordslaw_analysis-1.1.0.tar.gz
Algorithm Hash digest
SHA256 04a1dfd9b29bda6a0cd707fe37a0a8c032851482d521835a850971bbf83a1c00
MD5 ca4ca6e4cbf452205e18a1eabde3882a
BLAKE2b-256 d61b76d0357efeeea8cbfbcb673e765bcbe4829d5bca199728dcd018ddb1ecf7

See more details on using hashes here.

File details

Details for the file benfordslaw_analysis-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: benfordslaw_analysis-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/65.6.3 requests-toolbelt/0.8.0 tqdm/4.66.1 CPython/3.8.10

File hashes

Hashes for benfordslaw_analysis-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bbe2925d4c77531117d0c8c539b59288db6e91dfa338f9f64e7828ba99f9a0b7
MD5 26787b30c162ba1c575adfb20f0caeea
BLAKE2b-256 b44176bbeb55d6797b4c85e89865473b08061303b40d09504d1bff7555836206

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page