A library for testing data sets with Bendford's Law

These details have not been verified by PyPI

Project links

Project description

Benford for Python

Citing

If you find Benford_py useful in your research, please consider adding the following citation:

@misc{benford_py,
      author = {Marcel, Milcent},
      title = {{Benford_py: a Python Implementation of Benford's Law Tests}},
      year = {2017},
      publisher = {GitHub},
      journal = {GitHub repository},
      howpublished = {\url{https://github.com/milcent/benford_py}},
}

current version = 0.5.0

See release notes for features in this and in older versions

Python versions >= 3.6

Installation

Benford_py is a package in PyPi, so you can install with pip:

pip install benford_py

pip install benford-py

Or you can cd into the site-packages subfolder of your python distribution (or environment) and git clone from there:

git clone https://github.com/milcent/benford_py

For a quick start, please go to the Demo notebook, in which I show examples on how to run the tests with the SPY (S&P 500 ETF) daily returns.

For more fine-grained details of the functions and classes, see the docs.

Background

The first digit of a number is its leftmost digit

Since the first digit of any number can range from "1" to "9" (not considering "0"), it would be intuitively expected that the proportion of each occurrence in a set of numerical records would be uniformly distributed at 1/9, i.e., approximately 0.1111, or 11.11%.

Benford's Law, also known as the Law of First Digits or the Phenomenon of Significant Digits, is the finding that the first digits of the numbers found in series of records of the most varied sources do not display a uniform distribution, but rather are arranged in such a way that the digit "1" is the most frequent, followed by "2", "3", and so in a successive and decremental way down to "9", which presents the lowest frequency as the first digit.

The expected distributions of the First Digits in a Benford-compliant data set are the ones shown here

The first record on the subject dates from 1881, in the work of Simon Newcomb, an American-Canadian astronomer and mathematician, who noted that in the logarithmic tables the first pages, which contained logarithms beginning with the numerals "1" and "2", were more worn out, that is, more consulted.

In that same article, Newcomb proposed the formula for the probability of a certain digit "d" being the first digit of a number, given by the following equation.

In 1938, the American physicist Frank Benford revisited the phenomenon, which he called the "Law of Anomalous Numbers," in a survey with more than 20,000 observations of empirical data compiled from various sources, ranging from areas of rivers to molecular weights of chemical compounds, including cost data, address numbers, population sizes and physical constants. All of them, to a greater or lesser extent, followed such distribution.

The extent of Benford's work seems to have been one good reason for the phenomenon to be popularized with his name, though described by Newcomb 57 years earlier.

Derivations of the original formula were also applied in the expected findings of the proportions of digits in other positions in the number, as in the case of the second digit (BENFORD, 1938), as well as combinations, such as the first two digits of a number (NIGRINI, 2012, p.5).

Only in 1995, however, was the phenomenon proven by Hill. His proof was based on the fact that numbers in data series following the Benford Law are, in effect, "second generation" distributions, ie combinations of other distributions. The union of randomly drawn samples from various distributions forms a distribution that respects Benford's Law (HILL, 1995).

When grouped in ascending order, data that obey Benford's Law must approximate a geometric sequence (NIGRINI, 2012, page 21). From this it follows that the logarithms of this ordered series must form a straight line. In addition, the mantissas (decimal parts) of the logarithms of these numbers must be uniformly distributed in the interval [0,1] (NIGRINI, 2012, p.10).

In general, a series of numerical records follows Benford's Law when (NIGRINI, 2012, p.21):

it represents magnitudes of events or events, such as populations of cities, flows of water in rivers or sizes of celestial bodies;
it does not have pre-established minimum or maximum limits;
it is not made up of numbers used as identifiers, such as identity or social security numbers, bank accounts, telephone numbers; and
its mean is less than the median, and the data is not concentrated around the mean.

It follows from this expected distribution that, if the set of numbers in a series of records that usually respects the Law shows a deviation in the proportions found, there may be distortions, whether intentional or not.

Benford's Law has been used in several fields. Afer asserting that the usual data type is Benford-compliant, one can study samples from the same data type tin search of inconsistencies, errors or even fraud.

This open source module is an attempt to facilitate the performance of Benford's Law-related tests by people using Python, whether interactively or in an automated, scripting way.

It uses the versatility of numpy and pandas, along with matplotlib for vizualization, to deliver results like this one and much more.

It has been a long time since I last tested it in Python 2. The death clock has stopped ticking, so officially it is for Python 3 now. It should work on Linux, Windows and Mac, but please file a bug report if you run into some trouble.

Also, if you have some nice data set that we can run these tests on, let'us try it.

Thanks!

Milcent

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

Jun 29, 2021

0.4.3.0

Jun 6, 2021

0.4.2

May 23, 2021

0.4.1

Apr 28, 2021

0.4.0

Apr 25, 2021

0.3.3

Jan 24, 2021

0.3.2

Dec 16, 2020

0.3.1

Dec 16, 2020

0.3.0

Dec 16, 2020

0.2.7

Apr 14, 2020

0.2.6

Mar 14, 2020

0.2.5

Feb 2, 2020

0.2.1

Jan 20, 2020

0.2.0

Jan 7, 2020

0.1.0.3

Dec 20, 2017

0.1.0.2

Dec 16, 2017

0.1.0.1

Dec 6, 2017

0.1.0

Dec 4, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benford_py-0.5.0.tar.gz (32.8 kB view details)

Uploaded Jun 29, 2021 Source

Built Distribution

benford_py-0.5.0-py3-none-any.whl (32.1 kB view details)

Uploaded Jun 29, 2021 Python 3

File details

Details for the file benford_py-0.5.0.tar.gz.

File metadata

Download URL: benford_py-0.5.0.tar.gz
Upload date: Jun 29, 2021
Size: 32.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for benford_py-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`5af48d4abd572ffd3f1d85a738ffa5cfa3d459902397b6105db01775515bf190`
MD5	`c2eb55a0535924adf1d2a42081295b91`
BLAKE2b-256	`9a98643d32eac72ace755fd06eee566210c581ce536ed94956dd7b4e71e613ea`

See more details on using hashes here.

File details

Details for the file benford_py-0.5.0-py3-none-any.whl.

File metadata

Download URL: benford_py-0.5.0-py3-none-any.whl
Upload date: Jun 29, 2021
Size: 32.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for benford_py-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52e3c44fdce15cd9fc6372a5515654e19e14530384751ad9211efdcddd074622`
MD5	`cf755e6df867af8e56ce05c231278821`
BLAKE2b-256	`c9253855e4960b74f50bc0e7089ac7a8a65d414c565aec3a6f183e8df7692e10`

See more details on using hashes here.

benford-py 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Benford for Python

See release notes for features in this and in older versions

Python versions >= 3.6

Installation

Background

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes