Skip to main content

Python bindings for lawkit - statistical law analysis toolkit for fraud detection, data quality assessment, and audit compliance. Powered by Rust for blazing fast performance.

Project description

lawkit

Python wrapper for the lawkit CLI tool - Statistical law analysis toolkit for fraud detection and data quality assessment.

Installation

pip install lawkit

This includes the lawkit binary embedded in the wheel - no download required.

Quick Start

import lawkit

# Analyze financial data with Benford Law
result = lawkit.analyze_benford('financial_data.csv')
print(result)

# Get structured JSON output
json_result = lawkit.analyze_benford(
    'accounting.csv',
    lawkit.LawkitOptions(format='json')
)
print(f"Risk level: {json_result.risk_level}")
print(f"P-value: {json_result.p_value}")

# Check if data follows Pareto principle (80/20 rule)
pareto_result = lawkit.analyze_pareto(
    'sales_data.csv',
    lawkit.LawkitOptions(format='json', gini_coefficient=True)
)
print(f"Gini coefficient: {pareto_result.gini_coefficient}")
print(f"80/20 concentration: {pareto_result.concentration_80_20}")

Features

Statistical Laws Supported

  • Benford Law: Detect fraud and anomalies in numerical data
  • Pareto Principle: Analyze 80/20 distributions and concentration
  • Zipf Law: Analyze word frequencies and power-law distributions
  • Normal Distribution: Test for normality and detect outliers
  • Poisson Distribution: Analyze rare events and count data

Advanced Analysis

  • Multi-law Comparison: Compare multiple statistical laws on the same data
  • Outlier Detection: Advanced anomaly detection algorithms
  • Time Series Analysis: Trend and seasonality detection
  • International Numbers: Support for various number formats (Japanese, Chinese, etc.)
  • Memory Efficient: Handle large datasets with streaming analysis

File Format Support

  • CSV, JSON, YAML, TOML, XML: Standard structured data formats
  • Excel Files: .xlsx and .xls support
  • PDF Documents: Extract and analyze numerical data from PDFs
  • Word Documents: Analyze data from .docx files
  • PowerPoint: Extract data from presentations

Usage Examples

Command Line Interface (CLI) via Python Module

# Install and use immediately - binary included automatically
pip install lawkit

# Use lawkit CLI directly through Python module
python -m lawkit benf financial_data.csv
python -m lawkit pareto sales_data.csv --gini-coefficient
python -m lawkit analyze --laws all dataset.csv
python -m lawkit validate dataset.csv --consistency-check
python -m lawkit diagnose dataset.csv --report detailed

# Generate sample data for testing
python -m lawkit generate benf --samples 1000 --output-file test_data.csv
python -m lawkit generate pareto --samples 500 --concentration 0.8

Modern API (Recommended)

import lawkit

# Analyze with Benford Law
result = lawkit.analyze_benford('invoice_data.csv')
print(result)

# Get detailed JSON analysis
json_result = lawkit.analyze_benford(
    'financial_statements.xlsx',
    lawkit.LawkitOptions(
        format='excel',
        output='json',
        confidence=0.95,
        verbose=True
    )
)

if json_result.risk_level == "High":
    print("⚠️  High risk of fraud detected!")
    print(f"Chi-square: {json_result.chi_square}")
    print(f"P-value: {json_result.p_value}")
    print(f"MAD: {json_result.mad}%")

# Pareto analysis for business insights
pareto_result = lawkit.analyze_pareto(
    'customer_revenue.csv',
    lawkit.LawkitOptions(
        output='json',
        gini_coefficient=True,
        business_analysis=True,
        percentiles="70,80,90"
    )
)

print(f"Top 20% customers generate {pareto_result.concentration_80_20:.1f}% of revenue")
print(f"Income inequality (Gini): {pareto_result.gini_coefficient:.3f}")

# Normal distribution analysis with outlier detection
normal_result = lawkit.analyze_normal(
    'quality_measurements.csv',
    lawkit.LawkitOptions(
        output='json',
        outlier_detection=True,
        test_type='shapiro'
    )
)

if normal_result.p_value < 0.05:
    print("Data does not follow normal distribution")
    if normal_result.outliers:
        print(f"Found {len(normal_result.outliers)} outliers")

# Multi-law analysis
analysis = lawkit.analyze_laws(
    'complex_dataset.csv',
    lawkit.LawkitOptions(format='json', laws='benf,pareto,zipf')
)
print(f"Analysis results: {analysis.data}")
print(f"Overall risk level: {analysis.risk_level}")

# Data validation
validation = lawkit.validate_laws(
    'complex_dataset.csv',
    lawkit.LawkitOptions(format='json', consistency_check=True)
)
print(f"Validation status: {validation.data}")

# Conflict diagnosis
diagnosis = lawkit.diagnose_laws(
    'complex_dataset.csv',
    lawkit.LawkitOptions(format='json', report='detailed')
)
print(f"Diagnosis: {diagnosis.data}")

Generate Sample Data

import lawkit

# Generate Benford Law compliant data
benford_data = lawkit.generate_data('benf', samples=1000, seed=42)
print(benford_data)

# Generate normal distribution data
normal_data = lawkit.generate_data('normal', samples=500, mean=100, stddev=15)

# Generate Pareto distribution data
pareto_data = lawkit.generate_data('pareto', samples=1000, concentration=0.8)

# Test the pipeline: generate → analyze
data = lawkit.generate_data('benf', samples=10000, seed=42)
result = lawkit.analyze_string(data, 'benf', lawkit.LawkitOptions(output='json'))
print(f"Generated data risk level: {result.risk_level}")

Analyze String Data Directly

import lawkit

# Analyze CSV data from string
csv_data = """amount
123.45
456.78
789.12
234.56
567.89"""

result = lawkit.analyze_string(
    csv_data,
    'benf',
    lawkit.LawkitOptions(format='json')
)
print(f"Risk assessment: {result.risk_level}")

# Analyze JSON data
json_data = '{"values": [12, 23, 34, 45, 56, 67, 78, 89]}'
result = lawkit.analyze_string(
    json_data,
    'normal',
    lawkit.LawkitOptions(format='json')
)
print(f"Is normal: {result.p_value > 0.05}")

Advanced Options

import lawkit

# High-performance analysis with optimization
result = lawkit.analyze_benford(
    'large_dataset.csv',
    lawkit.LawkitOptions(
        optimize=True,
        parallel=True,
        memory_efficient=True,
        min_count=50,
        threshold=0.001
    )
)

# International number support
result = lawkit.analyze_benford(
    'japanese_accounting.csv',
    lawkit.LawkitOptions(
        international=True,
        format='csv',
        output='json'
    )
)

# Time series analysis
result = lawkit.analyze_normal(
    'sensor_data.csv',
    lawkit.LawkitOptions(
        time_series=True,
        outlier_detection=True,
        output='json'
    )
)

Legacy API (Backward Compatibility)

from lawkit import run_lawkit

# Direct command execution
result = run_lawkit(["benf", "data.csv", "--format", "csv", "--output", "json"])

if result.returncode == 0:
    print("Analysis successful")
    print(result.stdout)
else:
    print("Analysis failed")
    print(result.stderr)

# Legacy analysis functions
from lawkit.compat import run_benford_analysis, run_pareto_analysis

benford_result = run_benford_analysis("financial.csv", format="csv", output="json")
pareto_result = run_pareto_analysis("sales.csv", gini_coefficient=True)

Installation and Setup

Automatic Installation (Recommended)

pip install lawkit

The binary is pre-embedded in the wheel for your platform.

Manual Binary Installation

If automatic download fails:

lawkit-download-binary

Development Installation

git clone https://github.com/kako-jun/lawkit
cd lawkit/lawkit-python
pip install -e .[dev]

Verify Installation

import lawkit

# Check if lawkit is available
if lawkit.is_lawkit_available():
    print("✅ lawkit is installed and working")
    print(f"Version: {lawkit.get_version()}")
else:
    print("❌ lawkit is not available")

# Run self-test
if lawkit.selftest():
    print("✅ All tests passed")
else:
    print("❌ Self-test failed")

Use Cases

Financial Fraud Detection

import lawkit

# Analyze invoice amounts for fraud
result = lawkit.analyze_benford('invoices.csv', 
                               lawkit.LawkitOptions(output='json'))

if result.risk_level in ['High', 'Critical']:
    print("🚨 Potential fraud detected in invoice data")
    print(f"Statistical significance: p={result.p_value:.6f}")
    print(f"Deviation from Benford Law: {result.mad:.2f}%")

Business Intelligence

import lawkit

# Analyze customer revenue distribution
result = lawkit.analyze_pareto('customer_revenue.csv',
                              lawkit.LawkitOptions(
                                  output='json',
                                  business_analysis=True,
                                  gini_coefficient=True
                              ))

print(f"Revenue concentration: {result.concentration_80_20:.1f}%")
print(f"Market inequality: {result.gini_coefficient:.3f}")

Quality Control

import lawkit

# Analyze manufacturing measurements
result = lawkit.analyze_normal('measurements.csv',
                              lawkit.LawkitOptions(
                                  output='json',
                                  outlier_detection=True,
                                  test_type='shapiro'
                              ))

if result.p_value < 0.05:
    print("⚠️  Process out of control - not following normal distribution")
    if result.outliers:
        print(f"Found {len(result.outliers)} outlying measurements")

Text Analysis

import lawkit

# Analyze word frequency in documents
result = lawkit.analyze_zipf('document.txt',
                            lawkit.LawkitOptions(output='json'))

print(f"Text follows Zipf Law: {result.p_value > 0.05}")
print(f"Power law exponent: {result.exponent:.3f}")

API Reference

Main Functions

  • analyze_benford(input_data, options) - Benford Law analysis
  • analyze_pareto(input_data, options) - Pareto principle analysis
  • analyze_zipf(input_data, options) - Zipf Law analysis
  • analyze_normal(input_data, options) - Normal distribution analysis
  • analyze_poisson(input_data, options) - Poisson distribution analysis
  • analyze_laws(input_data, options) - Multi-law analysis
  • validate_laws(input_data, options) - Data validation and consistency check
  • diagnose_laws(input_data, options) - Conflict diagnosis and detailed reporting
  • generate_data(law_type, samples, **kwargs) - Generate sample data
  • analyze_string(content, law_type, options) - Analyze string data directly

Utility Functions

  • is_lawkit_available() - Check if lawkit CLI is available
  • get_version() - Get lawkit version
  • selftest() - Run self-test

Classes

  • LawkitOptions - Configuration options for analysis
  • LawkitResult - Analysis results with structured access
  • LawkitError - Exception class for lawkit errors

Platform Support

  • Windows: x86_64
  • macOS: x86_64, ARM64 (Apple Silicon)
  • Linux: x86_64, ARM64

Requirements

  • Python 3.8+
  • No additional dependencies required

License

This project is licensed under the MIT License.

Support

Contributing

Contributions are welcome! Please read the Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lawkit_python-2.6.2-cp311-cp311-win_amd64.whl (927.7 kB view details)

Uploaded CPython 3.11Windows x86-64

lawkit_python-2.6.2-cp311-cp311-musllinux_1_1_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11musllinux: musl 1.1+ x86-64

lawkit_python-2.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

lawkit_python-2.6.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

lawkit_python-2.6.2-cp311-cp311-macosx_11_0_arm64.whl (926.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

lawkit_python-2.6.2-cp311-cp311-macosx_10_12_x86_64.whl (966.6 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file lawkit_python-2.6.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for lawkit_python-2.6.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8afe418bdd7c232d76f194d3e0e3b9194dc7991cf110b3c37e8dba5b823520a5
MD5 c40aed6f6ced4b0c7d97b089dfbf1dcb
BLAKE2b-256 dcd72cc62a6fb6538775a3e9b2a659449642739a5c26e533d176f4ef72ed6bcb

See more details on using hashes here.

File details

Details for the file lawkit_python-2.6.2-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for lawkit_python-2.6.2-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 3795b4c36525381a1a7ed934d9d5ba551396c0fd84615551e0359050a21c5038
MD5 f7d44265062a824493aa36205f2fc274
BLAKE2b-256 7f7647c951bc753707b9972f0d933400c336f9fab68a6cc2946833deb0f77953

See more details on using hashes here.

File details

Details for the file lawkit_python-2.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lawkit_python-2.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 21b439a9df74a535aca2547672878399f5a29e7d19f7858d25d6da6b9b2da0d3
MD5 879b4ad1082b19fa38cac5acd792ba16
BLAKE2b-256 d91986f1169094cc10cdd88e416a66a792e7849bef6b3cad4761a7a786a399ed

See more details on using hashes here.

File details

Details for the file lawkit_python-2.6.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for lawkit_python-2.6.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a0d9aa1e88961e15683b8818ed8910fb20eaa2d75dbde7b530ed2868e01b1882
MD5 3d2d302b2e7f528665da6463f9e41f2d
BLAKE2b-256 ab021b6f88681a707cdef5d616adf31ecb47290d7e785d71ae95a4a3c77c004d

See more details on using hashes here.

File details

Details for the file lawkit_python-2.6.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lawkit_python-2.6.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0b706ca3203faf8e4e1e1f2d20d30922622457b00ee861c5abda6875f981dba1
MD5 95b61780867edbc8c0702eafc2863944
BLAKE2b-256 ff833c68fe50d878b1f48e28f9148df54a447326cfb34e1670a39f8d6ce25a52

See more details on using hashes here.

File details

Details for the file lawkit_python-2.6.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for lawkit_python-2.6.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f3956f7cb720bfdb8225df90a609c726ac662125fa0d3fdbd1acb801d4cb6c55
MD5 e73fb88db6ffecb12e2b3dab2c3dd60b
BLAKE2b-256 f97a6a3a6d1e08ff3a9c9056773f7f11549e04e65627bf215917d90242dd6e80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page