Skip to main content

Professional-grade Benford's Law analysis toolkit for forensic accounting, auditing, and fraud detection

Project description

pybenford

Professional-grade Benford's Law analysis toolkit for forensic accounting, auditing, and fraud detection.

PyPI version Python versions License: MIT Tests Coverage

Why pybenford?

Existing Benford's Law packages on PyPI cover first-digit chi-square and not much else. Most are unmaintained. pybenford implements the complete Nigrini forensic accounting workflow from Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (Nigrini, 2012):

  • MAD conformity classification with Nigrini's empirical thresholds (close, acceptable, marginally acceptable, nonconformity)
  • Distortion factor model for detecting overstatement vs. understatement
  • Second-order test on differences of sorted values
  • Summation test with uniform 1/90 expectation
  • Mantissa arc test (Alexander, 2009) with L-squared statistic
  • Number duplication analysis
  • All standard digit tests: first, second, third, first-two, first-three, last-two
  • Z-statistic with Fleiss continuity correction, chi-square, Kolmogorov-Smirnov
  • Publication-quality matplotlib visualizations
  • Pure NumPy internals, no pandas dependency

Installation

pip install pybenford

Quick Start

from pybenford import BenfordAnalysis

analysis = BenfordAnalysis(data)  # list, numpy array, or pandas/polars Series
result = analysis.first_digit()
print(result)

Every result object has a formatted print() output. Three lines from raw data to a conformity report:

=======================================================
  First Digit Test  (n=3,195  alpha=0.05)
=======================================================
 Digit   Count   Observed   Expected   Z-Score   Sig
     1    956   29.92%    30.10%      0.20
     2    595   18.62%    17.61%      1.48
     3    389   12.18%    12.49%      0.52
     4    299    9.36%     9.69%      0.61
     5    255    7.98%     7.92%      0.10
     6    197    6.17%     6.69%      1.16
     7    180    5.63%     5.80%      0.36
     8    171    5.35%     5.12%      0.57
     9    153    4.79%     4.58%      0.53
-------------------------------------------------------
 MAD:        0.0034 — Close Conformity
 Chi-Square: 4.6922  (critical: 15.5073) — Pass
 KS:         0.0083  (critical: 0.0240)  — Pass
=======================================================

For tests with many digit bins (first-two, first-three), the display shows only flagged digits instead of all 90 or 900 rows:

=======================================================
  First Two Digits Test  (n=3,195  alpha=0.05)
=======================================================
 Flagged Digits (7 of 90):
 Digit   Count   Observed   Expected   Z-Score
    35     24    0.75%     1.22%      2.35  *
    49     16    0.50%     0.88%      2.19  *
    66     33    1.03%     0.65%      2.56  *
    70     29    0.91%     0.62%      1.99  *
    75     28    0.88%     0.58%      2.13  *
    76      9    0.28%     0.57%      2.03  *
    77      9    0.28%     0.56%      1.99  *
-------------------------------------------------------
 MAD:        0.0015 — Acceptable Conformity
 Chi-Square: 104.9157  (critical: 112.0220) — Pass
 KS:         0.0102  (critical: 0.0240)  — Pass
=======================================================

All results are also accessible programmatically:

result = analysis.first_digit()

result.mad                     # 0.0034
result.mad_conformity          # "close"
result.chi_square              # 4.6922
result.chi_square_significant  # False
result.ks_statistic            # 0.0083
result.ks_critical             # 0.0240
result.z_scores                # array of per-digit Z-scores
result.significant_flags       # bool array of flagged digits
result.observed                # array of observed proportions
result.expected                # array of expected Benford proportions
result.digits                  # array of digit labels
result.counts                  # array of raw counts
result.n                       # number of records analyzed
result.alpha                   # significance level used
result.test_name               # e.g. "First Digit Test"

Demo Notebook

A complete walkthrough of every test and visualization is available in examples/demo.ipynb. It runs against US Census county population data and shows the output of all 11 tests, 6 plot functions, and programmatic result access.

Data Preparation

analysis = BenfordAnalysis(
    data,                        # list, array, or Series of numbers
    sign_filter="positive",      # "all", "positive", or "negative"
    min_abs_value=10.0,          # exclude small values (optional)
    drop_zero=True,              # exclude zeros (default: True)
)

print(analysis.profile)          # data profile per Nigrini Ch. 4

sign_filter separates income from expense items for independent analysis. min_abs_value excludes values below a minimum magnitude, since very small numbers distort digit distributions.

Visualization

Plot functions return (Figure, Axes) with no side effects.

from pybenford.visualization import plot_digit_test, plot_mantissa_arc

result = analysis.first_two_digits()
fig, ax = plot_digit_test(result, show_confidence=True)
fig.savefig("first_two_digits.png", dpi=150)

arc = analysis.mantissa_arc()
fig, ax = plot_mantissa_arc(arc)

from pybenford.visualization import plot_z_scores
fig, ax = plot_z_scores(result, critical_value=1.96)

Available Tests

Method Description Reference
first_digit() First significant digit (1-9) Nigrini Ch. 5
second_digit() Second significant digit (0-9) Nigrini Ch. 5
third_digit() Third significant digit (0-9) Nigrini Ch. 5
first_two_digits() First two digits (10-99) Nigrini Ch. 5
first_three_digits() First three digits (100-999) Nigrini Ch. 5
last_two_digits() Last two digits (00-99), uniform expected Nigrini Ch. 5
second_order() Digit test on sorted differences Nigrini Ch. 6
summation() Sum proportions vs. uniform 1/90 Nigrini Ch. 5
distortion_factor() Overstatement/understatement detection Nigrini Ch. 6
mantissa_arc() Uniformity of mantissas on unit circle Nigrini Ch. 7
number_duplication() Most frequently duplicated values Nigrini Ch. 5

Statistical Measures

Each digit test result includes:

  • Z-statistic per digit bin (Fleiss continuity correction)
  • Chi-square goodness-of-fit with critical value
  • Kolmogorov-Smirnov statistic with critical value
  • MAD (Mean Absolute Deviation) with Nigrini's conformity classification
  • Per-bin significance flags at configurable alpha

MAD Conformity Thresholds

MAD is the preferred conformity measure because chi-square and KS become overly sensitive with large datasets (N > 25,000), rejecting near-perfect conformity. MAD is sample-size independent.

Test Close Acceptable Marginal Nonconformity
First digit < 0.006 < 0.012 < 0.015 >= 0.015
Second digit < 0.008 < 0.010 < 0.012 >= 0.012
First two digits < 0.0012 < 0.0018 < 0.0022 >= 0.0022
First three digits < 0.00036 < 0.00044 < 0.00050 >= 0.00050

References

  • Nigrini, M.J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
  • Miller, S.J. (2015). Benford's Law: Theory and Applications. Princeton University Press.
  • Kossovsky, A.E. (2014). Benford's Law: Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications. World Scientific.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybenford-0.1.2.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybenford-0.1.2-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file pybenford-0.1.2.tar.gz.

File metadata

  • Download URL: pybenford-0.1.2.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pybenford-0.1.2.tar.gz
Algorithm Hash digest
SHA256 71a0f37a1b7d7dd5f861796c03f6e55346fc7e8d32fdd0c46ebc4a0752d10261
MD5 3106fa3f81fc273c70c92c2d99a99ef1
BLAKE2b-256 9f2b0b5dcc9d686883e0609f01fc3d479a904cadc225314b280e403443acd0a6

See more details on using hashes here.

File details

Details for the file pybenford-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pybenford-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pybenford-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 82e2c7d9c3cfbac29da1fb6b59f026365164c06442c864cc0a91b7d1169555fd
MD5 3a2a29a302d5b91ecde03bc1ec885618
BLAKE2b-256 61bf5e70fd05b76abfbf54d8a12d711a7be2b7e97d1e9d3492019210a47e19bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page