Skip to main content

Professional-grade Benford's Law analysis toolkit for forensic accounting, auditing, and fraud detection

Project description

pybenford

Professional-grade Benford's Law analysis toolkit for forensic accounting, auditing, and fraud detection.

PyPI version Python versions License: MIT Tests Coverage

Why pybenford?

Existing Benford's Law packages on PyPI cover first-digit chi-square and not much else. Most are unmaintained. pybenford implements the complete Nigrini forensic accounting workflow from Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (Nigrini, 2012):

  • MAD conformity classification with Nigrini's empirical thresholds (close, acceptable, marginally acceptable, nonconformity)
  • Distortion factor model for detecting overstatement vs. understatement
  • Second-order test on differences of sorted values
  • Summation test with uniform 1/90 expectation
  • Mantissa arc test (Alexander, 2009) with L-squared statistic
  • Number duplication analysis
  • All standard digit tests: first, second, third, first-two, first-three, last-two
  • Z-statistic with Fleiss continuity correction, chi-square, Kolmogorov-Smirnov
  • Publication-quality matplotlib visualizations
  • Pure NumPy internals, no pandas dependency

Installation

pip install pybenford

Quick Start

from pybenford import BenfordAnalysis

analysis = BenfordAnalysis(data)  # list, numpy array, or pandas/polars Series
result = analysis.first_digit()
print(result)

Every result object has a formatted print() output. Three lines from raw data to a conformity report:

=======================================================
  First Digit Test  (n=3,195  alpha=0.05)
=======================================================
 Digit   Count   Observed   Expected   Z-Score   Sig
     1    956   29.92%    30.10%      0.20
     2    595   18.62%    17.61%      1.48
     3    389   12.18%    12.49%      0.52
     4    299    9.36%     9.69%      0.61
     5    255    7.98%     7.92%      0.10
     6    197    6.17%     6.69%      1.16
     7    180    5.63%     5.80%      0.36
     8    171    5.35%     5.12%      0.57
     9    153    4.79%     4.58%      0.53
-------------------------------------------------------
 MAD:        0.0034 — Close Conformity
 Chi-Square: 4.6922  (critical: 15.5073) — Pass
 KS:         0.0083  (critical: 0.0240)  — Pass
=======================================================

For tests with many digit bins (first-two, first-three), the display shows only flagged digits instead of all 90 or 900 rows:

=======================================================
  First Two Digits Test  (n=3,195  alpha=0.05)
=======================================================
 Flagged Digits (7 of 90):
 Digit   Count   Observed   Expected   Z-Score
    35     24    0.75%     1.22%      2.35  *
    49     16    0.50%     0.88%      2.19  *
    66     33    1.03%     0.65%      2.56  *
    70     29    0.91%     0.62%      1.99  *
    75     28    0.88%     0.58%      2.13  *
    76      9    0.28%     0.57%      2.03  *
    77      9    0.28%     0.56%      1.99  *
-------------------------------------------------------
 MAD:        0.0015 — Acceptable Conformity
 Chi-Square: 104.9157  (critical: 112.0220) — Pass
 KS:         0.0102  (critical: 0.0240)  — Pass
=======================================================

All results are also accessible programmatically:

result = analysis.first_digit()

result.mad                     # 0.0034
result.mad_conformity          # "close"
result.chi_square              # 4.6922
result.chi_square_significant  # False
result.ks_statistic            # 0.0083
result.ks_critical             # 0.0240
result.z_scores                # array of per-digit Z-scores
result.significant_flags       # bool array of flagged digits
result.observed                # array of observed proportions
result.expected                # array of expected Benford proportions
result.digits                  # array of digit labels
result.counts                  # array of raw counts
result.n                       # number of records analyzed
result.alpha                   # significance level used
result.test_name               # e.g. "First Digit Test"

Data Preparation

analysis = BenfordAnalysis(
    data,                        # list, array, or Series of numbers
    sign_filter="positive",      # "all", "positive", or "negative"
    min_abs_value=10.0,          # exclude small values (optional)
    drop_zero=True,              # exclude zeros (default: True)
)

print(analysis.profile)          # data profile per Nigrini Ch. 4

sign_filter separates income from expense items for independent analysis. min_abs_value excludes values below a minimum magnitude, since very small numbers distort digit distributions.

Visualization

Plot functions return (Figure, Axes) with no side effects.

from pybenford.visualization import plot_digit_test, plot_mantissa_arc

result = analysis.first_two_digits()
fig, ax = plot_digit_test(result, show_confidence=True)
fig.savefig("first_two_digits.png", dpi=150)

arc = analysis.mantissa_arc()
fig, ax = plot_mantissa_arc(arc, analysis.clean_data)

from pybenford.visualization import plot_z_scores
fig, ax = plot_z_scores(result, critical_value=1.96)

Available Tests

Method Description Reference
first_digit() First significant digit (1-9) Nigrini Ch. 5
second_digit() Second significant digit (0-9) Nigrini Ch. 5
third_digit() Third significant digit (0-9) Nigrini Ch. 5
first_two_digits() First two digits (10-99) Nigrini Ch. 5
first_three_digits() First three digits (100-999) Nigrini Ch. 5
last_two_digits() Last two digits (00-99), uniform expected Nigrini Ch. 5
second_order() Digit test on sorted differences Nigrini Ch. 6
summation() Sum proportions vs. uniform 1/90 Nigrini Ch. 5
distortion_factor() Overstatement/understatement detection Nigrini Ch. 6
mantissa_arc() Uniformity of mantissas on unit circle Nigrini Ch. 7
number_duplication() Most frequently duplicated values Nigrini Ch. 5

Statistical Measures

Each digit test result includes:

  • Z-statistic per digit bin (Fleiss continuity correction)
  • Chi-square goodness-of-fit with critical value
  • Kolmogorov-Smirnov statistic with critical value
  • MAD (Mean Absolute Deviation) with Nigrini's conformity classification
  • Per-bin significance flags at configurable alpha

MAD Conformity Thresholds

MAD is the preferred conformity measure because chi-square and KS become overly sensitive with large datasets (N > 25,000), rejecting near-perfect conformity. MAD is sample-size independent.

Test Close Acceptable Marginal Nonconformity
First digit < 0.006 < 0.012 < 0.015 >= 0.015
Second digit < 0.008 < 0.010 < 0.012 >= 0.012
First two digits < 0.0012 < 0.0018 < 0.0022 >= 0.0022
First three digits < 0.00036 < 0.00044 < 0.00050 >= 0.00050

References

  • Nigrini, M.J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
  • Miller, S.J. (2015). Benford's Law: Theory and Applications. Princeton University Press.
  • Kossovsky, A.E. (2014). Benford's Law: Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications. World Scientific.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybenford-0.1.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybenford-0.1.1-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file pybenford-0.1.1.tar.gz.

File metadata

  • Download URL: pybenford-0.1.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pybenford-0.1.1.tar.gz
Algorithm Hash digest
SHA256 804367521f089c1084d22a859a02667a166c453007e729a8759fdf2e0c5ab520
MD5 dc5f2c16e52f69f8386b7d6072ea771d
BLAKE2b-256 284706c60c648465931c93191b8360068dbf12fe85b487f926922cff860bff0a

See more details on using hashes here.

File details

Details for the file pybenford-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pybenford-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pybenford-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f9ea6f69cc1e9e4b0834317df9d27011b178883f97169314d7553790bf0749b
MD5 d34dfbfc90ce220f70be9e2fdfd1e4b7
BLAKE2b-256 9a5f4d91d28432159746659fe75bde7c022ba2da147c43b89b902e80dc4a9110

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page