Python bindings for lawkit - statistical law analysis toolkit for fraud detection, data quality assessment, and audit compliance. Powered by Rust for blazing fast performance.
Project description
lawkit-python
Python wrapper for the lawkit CLI tool - Statistical law analysis toolkit for fraud detection and data quality assessment.
Installation
pip install lawkit-python
This includes the lawkit binary embedded in the wheel - no download required.
Quick Start
import lawkit
# Analyze financial data with Benford Law
result = lawkit.analyze_benford('financial_data.csv')
print(result)
# Get structured JSON output
json_result = lawkit.analyze_benford(
'accounting.csv',
lawkit.LawkitOptions(format='json')
)
print(f"Risk level: {json_result.risk_level}")
print(f"P-value: {json_result.p_value}")
# Check if data follows Pareto principle (80/20 rule)
pareto_result = lawkit.analyze_pareto(
'sales_data.csv',
lawkit.LawkitOptions(format='json', gini_coefficient=True)
)
print(f"Gini coefficient: {pareto_result.gini_coefficient}")
print(f"80/20 concentration: {pareto_result.concentration_80_20}")
Features
Statistical Laws Supported
- Benford Law: Detect fraud and anomalies in numerical data
- Pareto Principle: Analyze 80/20 distributions and concentration
- Zipf Law: Analyze word frequencies and power-law distributions
- Normal Distribution: Test for normality and detect outliers
- Poisson Distribution: Analyze rare events and count data
Advanced Analysis
- Multi-law Comparison: Compare multiple statistical laws on the same data
- Outlier Detection: Advanced anomaly detection algorithms
- Time Series Analysis: Trend and seasonality detection
- International Numbers: Support for various number formats (Japanese, Chinese, etc.)
- Memory Efficient: Handle large datasets with streaming analysis
File Format Support
- CSV, JSON, YAML, TOML, XML: Standard structured data formats
- Excel Files:
.xlsxand.xlssupport - PDF Documents: Extract and analyze numerical data from PDFs
- Word Documents: Analyze data from
.docxfiles - PowerPoint: Extract data from presentations
Usage Examples
Command Line Interface (CLI) via Python Module
# Install and use immediately - binary included automatically
pip install lawkit-python
# Use lawkit CLI directly through Python module
python -m lawkit benf financial_data.csv
python -m lawkit pareto sales_data.csv --gini-coefficient
python -m lawkit analyze --laws all dataset.csv
python -m lawkit validate dataset.csv --consistency-check
python -m lawkit diagnose dataset.csv --report detailed
# Generate sample data for testing
python -m lawkit generate benf --samples 1000 --output-file test_data.csv
python -m lawkit generate pareto --samples 500 --concentration 0.8
Modern API (Recommended)
import lawkit
# Analyze with Benford Law
result = lawkit.analyze_benford('invoice_data.csv')
print(result)
# Get detailed JSON analysis
json_result = lawkit.analyze_benford(
'financial_statements.xlsx',
lawkit.LawkitOptions(
format='excel',
output='json',
confidence=0.95,
verbose=True
)
)
if json_result.risk_level == "High":
print("⚠️ High risk of fraud detected!")
print(f"Chi-square: {json_result.chi_square}")
print(f"P-value: {json_result.p_value}")
print(f"MAD: {json_result.mad}%")
# Pareto analysis for business insights
pareto_result = lawkit.analyze_pareto(
'customer_revenue.csv',
lawkit.LawkitOptions(
output='json',
gini_coefficient=True,
business_analysis=True,
percentiles="70,80,90"
)
)
print(f"Top 20% customers generate {pareto_result.concentration_80_20:.1f}% of revenue")
print(f"Income inequality (Gini): {pareto_result.gini_coefficient:.3f}")
# Normal distribution analysis with outlier detection
normal_result = lawkit.analyze_normal(
'quality_measurements.csv',
lawkit.LawkitOptions(
output='json',
outlier_detection=True,
test_type='shapiro'
)
)
if normal_result.p_value < 0.05:
print("Data does not follow normal distribution")
if normal_result.outliers:
print(f"Found {len(normal_result.outliers)} outliers")
# Multi-law analysis
analysis = lawkit.analyze_laws(
'complex_dataset.csv',
lawkit.LawkitOptions(format='json', laws='benf,pareto,zipf')
)
print(f"Analysis results: {analysis.data}")
print(f"Overall risk level: {analysis.risk_level}")
# Data validation
validation = lawkit.validate_laws(
'complex_dataset.csv',
lawkit.LawkitOptions(format='json', consistency_check=True)
)
print(f"Validation status: {validation.data}")
# Conflict diagnosis
diagnosis = lawkit.diagnose_laws(
'complex_dataset.csv',
lawkit.LawkitOptions(format='json', report='detailed')
)
print(f"Diagnosis: {diagnosis.data}")
Generate Sample Data
import lawkit
# Generate Benford Law compliant data
benford_data = lawkit.generate_data('benf', samples=1000, seed=42)
print(benford_data)
# Generate normal distribution data
normal_data = lawkit.generate_data('normal', samples=500, mean=100, stddev=15)
# Generate Pareto distribution data
pareto_data = lawkit.generate_data('pareto', samples=1000, concentration=0.8)
# Test the pipeline: generate → analyze
data = lawkit.generate_data('benf', samples=10000, seed=42)
result = lawkit.analyze_string(data, 'benf', lawkit.LawkitOptions(output='json'))
print(f"Generated data risk level: {result.risk_level}")
Analyze String Data Directly
import lawkit
# Analyze CSV data from string
csv_data = """amount
123.45
456.78
789.12
234.56
567.89"""
result = lawkit.analyze_string(
csv_data,
'benf',
lawkit.LawkitOptions(format='json')
)
print(f"Risk assessment: {result.risk_level}")
# Analyze JSON data
json_data = '{"values": [12, 23, 34, 45, 56, 67, 78, 89]}'
result = lawkit.analyze_string(
json_data,
'normal',
lawkit.LawkitOptions(format='json')
)
print(f"Is normal: {result.p_value > 0.05}")
Advanced Options
import lawkit
# High-performance analysis with optimization
result = lawkit.analyze_benford(
'large_dataset.csv',
lawkit.LawkitOptions(
optimize=True,
parallel=True,
memory_efficient=True,
min_count=50,
threshold=0.001
)
)
# International number support
result = lawkit.analyze_benford(
'japanese_accounting.csv',
lawkit.LawkitOptions(
international=True,
format='csv',
output='json'
)
)
# Time series analysis
result = lawkit.analyze_normal(
'sensor_data.csv',
lawkit.LawkitOptions(
time_series=True,
outlier_detection=True,
output='json'
)
)
Legacy API (Backward Compatibility)
from lawkit import run_lawkit
# Direct command execution
result = run_lawkit(["benf", "data.csv", "--format", "csv", "--output", "json"])
if result.returncode == 0:
print("Analysis successful")
print(result.stdout)
else:
print("Analysis failed")
print(result.stderr)
# Legacy analysis functions
from lawkit.compat import run_benford_analysis, run_pareto_analysis
benford_result = run_benford_analysis("financial.csv", format="csv", output="json")
pareto_result = run_pareto_analysis("sales.csv", gini_coefficient=True)
Installation and Setup
Automatic Installation (Recommended)
pip install lawkit-python
The binary is pre-embedded in the wheel for your platform.
Manual Binary Installation
If automatic download fails:
lawkit-download-binary
Development Installation
git clone https://github.com/kako-jun/lawkit
cd lawkit/lawkit-python
pip install -e .[dev]
Verify Installation
import lawkit
# Check if lawkit is available
if lawkit.is_lawkit_available():
print("✅ lawkit is installed and working")
print(f"Version: {lawkit.get_version()}")
else:
print("❌ lawkit is not available")
# Run self-test
if lawkit.selftest():
print("✅ All tests passed")
else:
print("❌ Self-test failed")
Use Cases
Financial Fraud Detection
import lawkit
# Analyze invoice amounts for fraud
result = lawkit.analyze_benford('invoices.csv',
lawkit.LawkitOptions(output='json'))
if result.risk_level in ['High', 'Critical']:
print("🚨 Potential fraud detected in invoice data")
print(f"Statistical significance: p={result.p_value:.6f}")
print(f"Deviation from Benford Law: {result.mad:.2f}%")
Business Intelligence
import lawkit
# Analyze customer revenue distribution
result = lawkit.analyze_pareto('customer_revenue.csv',
lawkit.LawkitOptions(
output='json',
business_analysis=True,
gini_coefficient=True
))
print(f"Revenue concentration: {result.concentration_80_20:.1f}%")
print(f"Market inequality: {result.gini_coefficient:.3f}")
Quality Control
import lawkit
# Analyze manufacturing measurements
result = lawkit.analyze_normal('measurements.csv',
lawkit.LawkitOptions(
output='json',
outlier_detection=True,
test_type='shapiro'
))
if result.p_value < 0.05:
print("⚠️ Process out of control - not following normal distribution")
if result.outliers:
print(f"Found {len(result.outliers)} outlying measurements")
Text Analysis
import lawkit
# Analyze word frequency in documents
result = lawkit.analyze_zipf('document.txt',
lawkit.LawkitOptions(output='json'))
print(f"Text follows Zipf Law: {result.p_value > 0.05}")
print(f"Power law exponent: {result.exponent:.3f}")
API Reference
Main Functions
analyze_benford(input_data, options)- Benford Law analysisanalyze_pareto(input_data, options)- Pareto principle analysisanalyze_zipf(input_data, options)- Zipf Law analysisanalyze_normal(input_data, options)- Normal distribution analysisanalyze_poisson(input_data, options)- Poisson distribution analysisanalyze_laws(input_data, options)- Multi-law analysisvalidate_laws(input_data, options)- Data validation and consistency checkdiagnose_laws(input_data, options)- Conflict diagnosis and detailed reportinggenerate_data(law_type, samples, **kwargs)- Generate sample dataanalyze_string(content, law_type, options)- Analyze string data directly
Utility Functions
is_lawkit_available()- Check if lawkit CLI is availableget_version()- Get lawkit versionselftest()- Run self-test
Classes
LawkitOptions- Configuration options for analysisLawkitResult- Analysis results with structured accessLawkitError- Exception class for lawkit errors
Platform Support
- Windows: x86_64
- macOS: x86_64, ARM64 (Apple Silicon)
- Linux: x86_64, ARM64
Requirements
- Python 3.8+
- No additional dependencies required
License
This project is licensed under the MIT License.
Support
- GitHub Issues: https://github.com/kako-jun/lawkit/issues
- Documentation: https://github.com/kako-jun/lawkit/tree/main/docs
- Examples: https://github.com/kako-jun/lawkit/tree/main/docs/user-guide/examples.md
Contributing
Contributions are welcome! Please read the Contributing Guide for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lawkit_python-2.5.16-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: lawkit_python-2.5.16-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 915.1 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf24aff968a317035ef933274ce10d31b9f8a5c5ff2625957b7f0463f580823a
|
|
| MD5 |
3f7940d22f4d7c5439dfee3e434c7fc0
|
|
| BLAKE2b-256 |
aed13afc326fbb92e810d5d9388b2fd07d93b079c2c616442661c5c165926421
|
File details
Details for the file lawkit_python-2.5.16-cp311-cp311-musllinux_1_1_x86_64.whl.
File metadata
- Download URL: lawkit_python-2.5.16-cp311-cp311-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.11, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63a407b58d81fbc90a0c79440430621e594f129a3d520f92278eb16bc60481a5
|
|
| MD5 |
a1153c4f4962d564ef1e97266be7a116
|
|
| BLAKE2b-256 |
8d196103d87bea9971c4e05d016f0c3ebabccdc940117817ce6c04a7a7d16a02
|
File details
Details for the file lawkit_python-2.5.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: lawkit_python-2.5.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23e8e25f024aceb95f6fb9f3eb7e392dad1a12973c7244150df6ab3860698006
|
|
| MD5 |
036fc1240f65005bb437432f384eb7bc
|
|
| BLAKE2b-256 |
ce70bf1430482fd76fb336b7e0a74651ec730daae73509189d2d702aa4371b8a
|
File details
Details for the file lawkit_python-2.5.16-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: lawkit_python-2.5.16-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
843be6a880fba92f99473642f0154ee4a6c6f26319909a38b5494643706a050c
|
|
| MD5 |
f086f203ac3efda78cd90e8c9250df9e
|
|
| BLAKE2b-256 |
5ae6c97ce08a56bef35d0e6d9e8b2e1f529eb85bd91067e6026eb5ba30b859ba
|
File details
Details for the file lawkit_python-2.5.16-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: lawkit_python-2.5.16-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 900.2 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58a2c6b7fac8f121f8464a99e35e859b027693ab778ffd0b4b3b52f991ab7e9b
|
|
| MD5 |
6567cc75def27bfa954bea47947e8d69
|
|
| BLAKE2b-256 |
38adf519604b5e13b2e477c2972c24650682b33883093097f14a4e1b12befd0c
|
File details
Details for the file lawkit_python-2.5.16-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: lawkit_python-2.5.16-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 955.3 kB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e24e7599daae973cc3369cabcdabed7e928c4372a5904e80f41b3ed24da5a44
|
|
| MD5 |
068e6e60c15b15ed198b1596c3271868
|
|
| BLAKE2b-256 |
f5eb0577fe789d8baa460b818d052ee679ffe51f0ebca892552eeb61a4fd44ec
|