Skip to main content

Apply and run tests of Newcomb-Benford's Law on provided data

Project description

Benford's Law

A python package for testing if a dataset of numbers passes Benford's law; also known as the law of analogous numbers.

Installation

pip install -U benfords-law

Usages

>> import numpy as np
>>
>> from benfords_law import BenfordsLaw
>> 
>> # initialize array with random numbers that will fail Benford's Law
>> data = np.random.randint(low=100, high=1000000, size=1000)
>> benfords = BenfordsLaw(data)
>> benfords.apply_benfords_law()
Chi-squared test failed with statistic: 998.0013682427352 and p-value: 4.032015415461028e-210
>> # Benford's Test Image Below:

Failed Benfords Test with Random Numbers

Dependencies

  • numpy==1.17.1
  • pandas==0.25.1
  • scipy==1.6.0
  • matplotlib==3.3.3

Introduction and Description

Newcomb-Benford's Law (The Law of Analogous Numbers) states that in many naturally occurring sets of numbers, the first significant digit is likely to be small. This means that in a set of numbers; eg. populations of countries in the world, the first digit of the number is most likely to be 1. And following that, the probability that the first digit is 2, is less that that of one, but greater than all the rest and so on and so forth.

In fact, this expectation from Benford's Law follows a very specific distribution that is shown below:

As such, using the example of country populations in the world, the distribution of first significant digits against the expected distribution from Benford's Law can be seen as follows: 2016 National Populations Benfords Test

This phenomenon is pervasive in many extensive sets of numbers. Examples are:

  • Earthquake Magnitudes
  • Dow Jones Industrial Average from 1990– 1993
  • 3,141 county populations from the 1990 U.S. Census
  • Distance of stars from earth in light years
  • Most common iphone passcodes

The explanations for this law are many and debated. However there are some key (non-exhaustive) characteristics of these sets of numbers. These sets generally:

  1. Occur 'naturally' without human manipulation
  2. Occur in many orders of magnitude.
  3. Have some exponentation going on

Why should we care? Glad you asked. If these numbers are naturally occurring, then if a set of naturally occuring set of numbers does not follow Benford's law , then there is cause to believe that there's something unscrupulous going on. For example, in cases where elections have been rigged, you'll find that numbers tallied do not follow Benford's Law.

As such it's a fair practice to detect inconsistencies in sets of numbers using Benford's Law. Examples of real work applications of Benford's Law in manipulation/fraud/misleading-data detection include:

  1. National COVID-19 Tracking Errors/Inconsitencies
  2. Electoral Fraud

Key Terminology

  • First Leading Digit (fsd):The first non-zero digit of a number. eg:
    • 52390234 has fsd=5
    • 0.0004562 has fsd=4
    • 2943.6 has fsd=2

#References

Wikipedia https://en.wikipedia.org/wiki/Benford's_law

Netflix Connected: Season 1; Episode (Digits) https://www.netflix.com/watch/81084953

Youtube Khan Academy: Benford's Law Explained https://youtu.be/SZUDoEdjTzg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

benfords_law-1.0.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file benfords_law-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: benfords_law-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for benfords_law-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7308b9ef94d47477a21de31136a26d3de8f21bf2ebbf0a79c01b604a37122f0
MD5 e54b6e2ae2e7858c12884dd5f73a7dc3
BLAKE2b-256 8cf3f6c7e4c5a6df063812e28005e4f4d729762e60c6041b5dc95fd35557e341

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page