Professional-grade Benford's Law analysis toolkit for forensic accounting, auditing, and fraud detection
Project description
pybenford
Professional-grade Benford's Law analysis toolkit for forensic accounting, auditing, and fraud detection.
Why pybenford?
Existing Benford's Law packages on PyPI cover first-digit chi-square and not much else. Most are unmaintained. pybenford implements the complete Nigrini forensic accounting workflow from Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (Nigrini, 2012):
- MAD conformity classification with Nigrini's empirical thresholds (close, acceptable, marginally acceptable, nonconformity)
- Distortion factor model for detecting overstatement vs. understatement
- Second-order test on differences of sorted values
- Summation test with uniform 1/90 expectation
- Mantissa arc test (Alexander, 2009) with L-squared statistic
- Number duplication analysis
- All standard digit tests: first, second, third, first-two, first-three, last-two
- Z-statistic with Fleiss continuity correction, chi-square, Kolmogorov-Smirnov
- Publication-quality matplotlib visualizations
- Pure NumPy internals, no pandas dependency
Installation
pip install pybenford
Quick Start
from pybenford import BenfordAnalysis
analysis = BenfordAnalysis(data) # list, numpy array, or pandas/polars Series
result = analysis.first_digit()
print(result)
Every result object has a formatted print() output. Three lines from raw data to a conformity report:
=======================================================
First Digit Test (n=3,195 alpha=0.05)
=======================================================
Digit Count Observed Expected Z-Score Sig
1 956 29.92% 30.10% 0.20
2 595 18.62% 17.61% 1.48
3 389 12.18% 12.49% 0.52
4 299 9.36% 9.69% 0.61
5 255 7.98% 7.92% 0.10
6 197 6.17% 6.69% 1.16
7 180 5.63% 5.80% 0.36
8 171 5.35% 5.12% 0.57
9 153 4.79% 4.58% 0.53
-------------------------------------------------------
MAD: 0.0034 — Close Conformity
Chi-Square: 4.6922 (critical: 15.5073) — Pass
KS: 0.0083 (critical: 0.0240) — Pass
=======================================================
For tests with many digit bins (first-two, first-three), the display shows only flagged digits instead of all 90 or 900 rows:
=======================================================
First Two Digits Test (n=3,195 alpha=0.05)
=======================================================
Flagged Digits (7 of 90):
Digit Count Observed Expected Z-Score
35 24 0.75% 1.22% 2.35 *
49 16 0.50% 0.88% 2.19 *
66 33 1.03% 0.65% 2.56 *
70 29 0.91% 0.62% 1.99 *
75 28 0.88% 0.58% 2.13 *
76 9 0.28% 0.57% 2.03 *
77 9 0.28% 0.56% 1.99 *
-------------------------------------------------------
MAD: 0.0015 — Acceptable Conformity
Chi-Square: 104.9157 (critical: 112.0220) — Pass
KS: 0.0102 (critical: 0.0240) — Pass
=======================================================
All results are also accessible programmatically:
result = analysis.first_digit()
result.mad # 0.0034
result.mad_conformity # "close"
result.chi_square # 4.6922
result.chi_square_significant # False
result.ks_statistic # 0.0083
result.ks_critical # 0.0240
result.z_scores # array of per-digit Z-scores
result.significant_flags # bool array of flagged digits
result.observed # array of observed proportions
result.expected # array of expected Benford proportions
result.digits # array of digit labels
result.counts # array of raw counts
result.n # number of records analyzed
result.alpha # significance level used
result.test_name # e.g. "First Digit Test"
Data Preparation
analysis = BenfordAnalysis(
data, # list, array, or Series of numbers
sign_filter="positive", # "all", "positive", or "negative"
min_abs_value=10.0, # exclude small values (optional)
drop_zero=True, # exclude zeros (default: True)
)
print(analysis.profile) # data profile per Nigrini Ch. 4
sign_filter separates income from expense items for independent analysis. min_abs_value excludes values below a minimum magnitude, since very small numbers distort digit distributions.
Visualization
Plot functions return (Figure, Axes) with no side effects.
from pybenford.visualization import plot_digit_test, plot_mantissa_arc
result = analysis.first_two_digits()
fig, ax = plot_digit_test(result, show_confidence=True)
fig.savefig("first_two_digits.png", dpi=150)
arc = analysis.mantissa_arc()
fig, ax = plot_mantissa_arc(arc, analysis.clean_data)
from pybenford.visualization import plot_z_scores
fig, ax = plot_z_scores(result, critical_value=1.96)
Available Tests
| Method | Description | Reference |
|---|---|---|
first_digit() |
First significant digit (1-9) | Nigrini Ch. 5 |
second_digit() |
Second significant digit (0-9) | Nigrini Ch. 5 |
third_digit() |
Third significant digit (0-9) | Nigrini Ch. 5 |
first_two_digits() |
First two digits (10-99) | Nigrini Ch. 5 |
first_three_digits() |
First three digits (100-999) | Nigrini Ch. 5 |
last_two_digits() |
Last two digits (00-99), uniform expected | Nigrini Ch. 5 |
second_order() |
Digit test on sorted differences | Nigrini Ch. 6 |
summation() |
Sum proportions vs. uniform 1/90 | Nigrini Ch. 5 |
distortion_factor() |
Overstatement/understatement detection | Nigrini Ch. 6 |
mantissa_arc() |
Uniformity of mantissas on unit circle | Nigrini Ch. 7 |
number_duplication() |
Most frequently duplicated values | Nigrini Ch. 5 |
Statistical Measures
Each digit test result includes:
- Z-statistic per digit bin (Fleiss continuity correction)
- Chi-square goodness-of-fit with critical value
- Kolmogorov-Smirnov statistic with critical value
- MAD (Mean Absolute Deviation) with Nigrini's conformity classification
- Per-bin significance flags at configurable alpha
MAD Conformity Thresholds
MAD is the preferred conformity measure because chi-square and KS become overly sensitive with large datasets (N > 25,000), rejecting near-perfect conformity. MAD is sample-size independent.
| Test | Close | Acceptable | Marginal | Nonconformity |
|---|---|---|---|---|
| First digit | < 0.006 | < 0.012 | < 0.015 | >= 0.015 |
| Second digit | < 0.008 | < 0.010 | < 0.012 | >= 0.012 |
| First two digits | < 0.0012 | < 0.0018 | < 0.0022 | >= 0.0022 |
| First three digits | < 0.00036 | < 0.00044 | < 0.00050 | >= 0.00050 |
References
- Nigrini, M.J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
- Miller, S.J. (2015). Benford's Law: Theory and Applications. Princeton University Press.
- Kossovsky, A.E. (2014). Benford's Law: Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications. World Scientific.
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pybenford-0.1.1.tar.gz.
File metadata
- Download URL: pybenford-0.1.1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
804367521f089c1084d22a859a02667a166c453007e729a8759fdf2e0c5ab520
|
|
| MD5 |
dc5f2c16e52f69f8386b7d6072ea771d
|
|
| BLAKE2b-256 |
284706c60c648465931c93191b8360068dbf12fe85b487f926922cff860bff0a
|
File details
Details for the file pybenford-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pybenford-0.1.1-py3-none-any.whl
- Upload date:
- Size: 32.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f9ea6f69cc1e9e4b0834317df9d27011b178883f97169314d7553790bf0749b
|
|
| MD5 |
d34dfbfc90ce220f70be9e2fdfd1e4b7
|
|
| BLAKE2b-256 |
9a5f4d91d28432159746659fe75bde7c022ba2da147c43b89b902e80dc4a9110
|