A tool for verifying PDF statements from Tanzanian and beyond institutions

These details have not been verified by PyPI

Project links

Project description

Statement Verification

A Python package for verifying PDF statements from financial institutions. Extracts metadata, detects the issuing institution, and provides verification scores.

Installation

# From PyPI (recommended)
pip install sverification

# Or from source
git clone https://github.com/Tausi-Africa/statement-verification.git
cd statement-verification
pip install -e .

Quick Start

# Command line usage with font comparison
verify-statement path/to/statement.pdf --brands statements_metadata.json --font-data statements_font_data.json

# Python API - Simple verification
import sverification

result = sverification.verify_statement_verbose("statement.pdf")
print(f"Brand: {result['detected_brand']}, Score: {result['combined_score']:.1f}%")
print(f"Metadata: {result['verification_score']:.1f}%, Font: {result['font_score']:.1f}%")

📚 Function Reference

1. `verify_statement_verbose()` - Complete Verification with Font Analysis

Purpose: Performs complete statement verification with metadata and font comparison.

import sverification

# Basic usage with both metadata and font verification
result = sverification.verify_statement_verbose("statement.pdf")

# With custom template files
result = sverification.verify_statement_verbose(
    pdf_path="statement.pdf",
    brands_json_path="custom_brands.json",
    font_data_json_path="custom_font_data.json"
)

# Access comprehensive results
print(f"Detected Brand: {result['detected_brand']}")
print(f"Combined Score: {result['combined_score']:.1f}%")
print(f"Metadata Score: {result['verification_score']:.1f}%")
print(f"Font Score: {result['font_score']:.1f}%")

# Check metadata fields
for field in result['field_results']:
    status = "✓" if field['match'] else "✗"
    print(f"[{status}] {field['field']}: {field['actual']} (expected: {field['expected']})")

# Check font fields
for font_field in result['font_results']:
    status = "✓" if font_field['match'] else "✗"
    print(f"[{status}] {font_field['field']}: {font_field['actual']} (expected: {font_field['expected']})")

Returns: Dictionary with complete verification data

detected_brand: Institution name
combined_score: Overall score combining metadata and font analysis
verification_score: Metadata verification score (0-100)
font_score: Font comparison score (0-100)
field_results: List of metadata field comparisons
font_results: List of font field comparisons
total_fields: Number of metadata fields checked
matched_fields: Number of matching metadata fields
total_font_fields: Number of font fields checked
matched_font_fields: Number of matching font fields
summary: Human-readable summary

2. `print_verification_report()` - Formatted Output with Font Analysis

Purpose: Prints a formatted verification report including font comparison.

import sverification

# Get verification results
result = sverification.verify_statement_verbose("statement.pdf")

# Print formatted report (same as CLI output)
sverification.print_verification_report(result)

# Output example:
# ========================================================================
# PDF: statement.pdf
# Detected brand: selcom
# Template used: selcom
# Metadata fields checked: 5
# Metadata fields matched: 5
# Metadata score: 100.0%
# Font fields checked: 4
# Font fields matched: 3
# Font score: 75.0%
# Combined verification score: 91.7%
# ------------------------------------------------------------------------
# Metadata Comparison (expected vs. actual):
#   [✓] pdf_version      expected='1.4'  actual='1.4'
#   [✓] creator          expected='Selcom'  actual='Selcom'
# ------------------------------------------------------------------------
# Font Comparison (expected vs. actual):
#   [✗] font_pdf_version expected='PDF-1.7'  actual='PDF-1.4'
#   [✓] font_count       expected=2  actual=2
#   [✓] font_names       expected=['Helvetica']  actual=['Helvetica']
# ------------------------------------------------------------------------
# Metadata Score: 100.0% | Font Score: 75.0% | Combined: 91.7%
# ========================================================================

3. `extract_all()` - PDF Metadata Extraction

Purpose: Extracts comprehensive metadata from PDF files.

import sverification

# Extract metadata
metadata = sverification.extract_all("statement.pdf")

# Access specific metadata
print(f"PDF Version: {metadata['pdf_version']}")
print(f"Creator: {metadata['creator']}")
print(f"Producer: {metadata['producer']}")
print(f"Creation Date: {metadata['creationdate']}")
print(f"Modification Date: {metadata['moddate']}")
print(f"EOF Markers: {metadata['eof_markers']}")
print(f"PDF Versions: {metadata['pdf_versions']}")

# Check for potential issues
if metadata['eof_markers'] > 1:
    print("⚠️  Multiple EOF markers detected")

if metadata['creationdate'] != metadata['moddate']:
    print("⚠️  Creation and modification dates differ")

Returns: Dictionary with extracted metadata

pdf_version: PDF specification version
creator: Application that created the PDF
producer: Software that produced the PDF
creationdate: When PDF was created
moddate: When PDF was last modified
eof_markers: Number of EOF markers (security indicator)
pdf_versions: Number of PDF versions

4. `get_company_name()` - Institution Detection

Purpose: Automatically detects the financial institution from PDF content.

import sverification

# Detect institution
company = sverification.get_company_name("statement.pdf")
print(f"Detected Institution: {company}")

# Handle unknown institutions
if company == "unknown":
    print("⚠️  Institution not recognized")
    print("Consider adding detection rules for this institution")

# Examples of detected institutions:
# "selcom", "vodacom", "airtel", "absa", "crdb", "nmb", etc.

Returns: String with institution code

Returns standardized institution codes (e.g., "selcom", "vodacom")
Returns "unknown" if institution cannot be detected

5. `extract_pdf_font_data()` - Font Information Extraction

Purpose: Extracts comprehensive font information from PDF files.

import sverification

# Extract font data
font_data = sverification.extract_pdf_font_data("statement.pdf")

# Access font information
print(f"PDF Version: {font_data['pdf_version']}")
print(f"Total Fonts: {font_data['total_no_of_fonts']}")
print(f"Font Names: {font_data['font_names']}")
print(f"Info Object: {font_data['info_object']}")

# Example output:
# {
#   'pdf_version': 'PDF-1.4',
#   'total_no_of_fonts': 2,
#   'font_names': ['Helvetica', 'AZHGJL+ArialMT'],
#   'info_object': '20 0 R'
# }

Returns: Dictionary with font information

pdf_version: PDF version from font perspective
total_no_of_fonts: Number of fonts used in the PDF
font_names: List of font names/identifiers
info_object: PDF info object reference

6. `compare_font_data()` - Font Comparison

Purpose: Compares extracted font data against expected font template.

import sverification

# Extract font data and load templates
font_data = sverification.extract_pdf_font_data("statement.pdf")
font_templates = sverification.load_font_data("statements_font_data.json")
company = sverification.get_company_name("statement.pdf")

# Get expected font template
expected_font = font_templates.get(company.lower(), [{}])[0]

# Compare font data
font_results, font_score = sverification.compare_font_data(font_data, expected_font)

print(f"Font Score: {font_score:.1f}%")
print("\nFont comparison results:")

for field_name, expected_val, actual_val, is_match in font_results:
    status = "✓ PASS" if is_match else "✗ FAIL"
    print(f"{status} {field_name}")
    print(f"  Expected: {expected_val}")
    print(f"  Actual:   {actual_val}")
    print()

Returns: Tuple of (results_list, percentage_score)

results_list: List of tuples (field, expected, actual, match_bool)
percentage_score: Float between 0-100

7. `load_font_data()` - Font Template Management

Purpose: Loads font templates for comparison.

import sverification

# Load font templates
font_data = sverification.load_font_data("statements_font_data.json")

# Check available font templates
print("Available font templates:")
for brand_code, templates in font_data.items():
    print(f"  - {brand_code}: {len(templates)} template(s)")

# Get font template for specific institution
selcom_font_templates = font_data.get("selcom", [])
if selcom_font_templates:
    template = selcom_font_templates[0]  # Use first template
    print(f"Expected PDF version: {template.get('pdf_version')}")
    print(f"Expected font count: {template.get('total_no_of_fonts')}")
    print(f"Expected fonts: {template.get('font_names')}")

Returns: Dictionary mapping institution codes to font template lists

8. `load_brands()` - Metadata Template Management

Purpose: Loads institution templates for comparison.

import sverification

# Load default templates
brands = sverification.load_brands("statements_metadata.json")

# Check available institutions
print("Available institutions:")
for brand_code, templates in brands.items():
    print(f"  - {brand_code}: {len(templates)} template(s)")

# Get template for specific institution
selcom_templates = brands.get("selcom", [])
if selcom_templates:
    template = selcom_templates[0]  # Use first template
    print(f"Expected PDF version for Selcom: {template.get('pdf_version')}")
    print(f"Expected creator: {template.get('creator')}")

Returns: Dictionary mapping institution codes to template lists

9. `compare_fields()` - Metadata Field Comparison

Purpose: Compares extracted metadata against expected template.

import sverification

# Extract metadata and load templates
metadata = sverification.extract_all("statement.pdf")
brands = sverification.load_brands("statements_metadata.json")
company = sverification.get_company_name("statement.pdf")

# Get expected template
expected = brands.get(company.lower(), [{}])[0]

# Compare fields
results, score = sverification.compare_fields(metadata, expected)

print(f"Overall Score: {score:.1f}%")
print("\nField-by-field results:")

for field_name, expected_val, actual_val, is_match in results:
    status = "✓ PASS" if is_match else "✗ FAIL"
    print(f"{status} {field_name}")
    print(f"  Expected: {expected_val}")
    print(f"  Actual:   {actual_val}")
    print()

Returns: Tuple of (results_list, percentage_score)

results_list: List of tuples (field, expected, actual, match_bool)
percentage_score: Float between 0-100

🔄 Common Workflows

Batch Processing with Font Analysis

import sverification
import os

def process_directory_with_fonts(pdf_directory):
    """Process all PDFs in a directory with font analysis"""
    results = []
    
    for filename in os.listdir(pdf_directory):
        if filename.endswith('.pdf'):
            pdf_path = os.path.join(pdf_directory, filename)
            
            try:
                result = sverification.verify_statement_verbose(pdf_path)
                results.append({
                    'file': filename,
                    'brand': result['detected_brand'],
                    'combined_score': result['combined_score'],
                    'metadata_score': result['verification_score'],
                    'font_score': result['font_score']
                })
                print(f"✓ {filename}: Combined {result['combined_score']:.1f}% (Meta: {result['verification_score']:.1f}%, Font: {result['font_score']:.1f}%)")
            except Exception as e:
                print(f"✗ {filename}: Error - {e}")
    
    return results

# Process all PDFs with enhanced analysis
results = process_directory_with_fonts("./statements/")

Font Quality Analysis

import sverification

def analyze_font_quality(pdf_path):
    """Analyze font quality and consistency"""
    try:
        font_data = sverification.extract_pdf_font_data(pdf_path)
        company = sverification.get_company_name(pdf_path)
        
        issues = []
        
        # Check for embedded fonts (potential security issue)
        embedded_fonts = [f for f in font_data.get('font_names', []) if '+' in f]
        if embedded_fonts:
            issues.append(f"Embedded fonts detected: {embedded_fonts}")
        
        # Check for unusual font count
        font_count = font_data.get('total_no_of_fonts', 0)
        if font_count > 5:
            issues.append(f"High font count: {font_count} fonts")
        elif font_count == 0:
            issues.append("No fonts detected")
        
        return {
            'company': company,
            'font_data': font_data,
            'issues': issues
        }
    except Exception as e:
        return {'error': str(e)}

# Analyze font quality
analysis = analyze_font_quality("statement.pdf")
if 'error' not in analysis:
    print(f"Institution: {analysis['company']}")
    print(f"Font Count: {analysis['font_data']['total_no_of_fonts']}")
    if analysis['issues']:
        print("⚠️  Font issues:")
        for issue in analysis['issues']:
            print(f"  - {issue}")
    else:
        print("✓ No font issues detected")

Custom Analysis

import sverification

def analyze_statement_quality(pdf_path):
    """Analyze statement quality indicators"""
    metadata = sverification.extract_all(pdf_path)
    company = sverification.get_company_name(pdf_path)
    
    issues = []
    
    # Check for multiple EOF markers (potential tampering)
    if metadata['eof_markers'] > 1:
        issues.append("Multiple EOF markers detected")
    
    # Check for date inconsistencies
    if metadata['creationdate'] != metadata['moddate']:
        issues.append("Creation and modification dates differ")
    
    # Check for unknown institution
    if company == "unknown":
        issues.append("Institution not recognized")
    
    return {
        'company': company,
        'issues': issues,
        'metadata': metadata
    }

# Analyze a statement
analysis = analyze_statement_quality("statement.pdf")
print(f"Institution: {analysis['company']}")
if analysis['issues']:
    print("⚠️  Issues found:")
    for issue in analysis['issues']:
        print(f"  - {issue}")
else:
    print("✓ No issues detected")

🔍 What's Verified

Metadata Analysis

PDF Version: Document format version
Creation/Modification Dates: Timestamp consistency
Creator/Producer: Software used to generate the PDF
EOF Markers: Security indicators (multiple markers may indicate tampering)
Document Properties: Author, subject, keywords, trapped status

Font Analysis (NEW!)

Font Count: Number of fonts used in the document
Font Names: Specific fonts and their identifiers
Font Embedding: Detection of embedded vs. system fonts
PDF Version Consistency: Cross-verification with metadata
Font Info Objects: Internal PDF reference validation

Combined Scoring

The package now provides three types of scores:

Metadata Score: Traditional metadata verification (0-100%)
Font Score: Font consistency verification (0-100%)
Combined Score: Weighted combination of both analyses

🏦 Supported Institutions

Banks: ABSA, CRDB, DTB, Exim, NMB, NBC, TCB, UBA
Mobile Money: Airtel, Tigo, Vodacom, Halotel, Selcom
Others: Azam Pesa, PayMaart, and more...

🧪 Testing

# Run tests
pytest

# Run with coverage
pytest --cov=sverification

📄 License

Proprietary software licensed under Black Swan AI Global. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Sep 3, 2025

0.1.1

Sep 2, 2025

0.1.0

Sep 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sverification-0.1.2.tar.gz (27.7 kB view details)

Uploaded Sep 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sverification-0.1.2-py3-none-any.whl (27.1 kB view details)

Uploaded Sep 3, 2025 Python 3

File details

Details for the file sverification-0.1.2.tar.gz.

File metadata

Download URL: sverification-0.1.2.tar.gz
Upload date: Sep 3, 2025
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sverification-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`1c470b60ef438438e790df1f55cd5eadc39bc75d67562bc5247214d8666741e2`
MD5	`31742f398bff76d8156f75ae3bae3e16`
BLAKE2b-256	`7dd6fb67a78bc2f3a164b4f0564fb63668445a35f13b67a28cee7b5d2644c264`

See more details on using hashes here.

File details

Details for the file sverification-0.1.2-py3-none-any.whl.

File metadata

Download URL: sverification-0.1.2-py3-none-any.whl
Upload date: Sep 3, 2025
Size: 27.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sverification-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2eaf7a49025795f2398919cc8ec5af3b18b147338d49a4bd5b28cf6289862672`
MD5	`109a86eb472c33f986885b818db88443`
BLAKE2b-256	`fc124973e5f20f0b714f66c952210b94115000d70d44c48dd708f1fd50fef699`

See more details on using hashes here.

sverification 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Statement Verification

Installation

Quick Start

📚 Function Reference

1. verify_statement_verbose() - Complete Verification with Font Analysis

2. print_verification_report() - Formatted Output with Font Analysis

3. extract_all() - PDF Metadata Extraction

4. get_company_name() - Institution Detection

5. extract_pdf_font_data() - Font Information Extraction

6. compare_font_data() - Font Comparison

7. load_font_data() - Font Template Management

8. load_brands() - Metadata Template Management

9. compare_fields() - Metadata Field Comparison

🔄 Common Workflows

Batch Processing with Font Analysis

Font Quality Analysis

Custom Analysis

🔍 What's Verified

Metadata Analysis

Font Analysis (NEW!)

Combined Scoring

🏦 Supported Institutions

🧪 Testing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `verify_statement_verbose()` - Complete Verification with Font Analysis

2. `print_verification_report()` - Formatted Output with Font Analysis

3. `extract_all()` - PDF Metadata Extraction

4. `get_company_name()` - Institution Detection

5. `extract_pdf_font_data()` - Font Information Extraction

6. `compare_font_data()` - Font Comparison

7. `load_font_data()` - Font Template Management

8. `load_brands()` - Metadata Template Management

9. `compare_fields()` - Metadata Field Comparison