Australian-focused PII detection and anonymization for the insurance industry

These details have not been verified by PyPI

Project links

Project description

Allyanonimiser

Australian-focused PII detection and anonymization for the insurance industry with support for stream processing of very large files.

Installation

# Basic installation
pip install allyanonimiser==2.0.0

# With stream processing support for large files
pip install "allyanonimiser[stream]==2.0.0"

# With LLM integration for advanced pattern generation
pip install "allyanonimiser[llm]==2.0.0"

# Complete installation with all optional dependencies
pip install "allyanonimiser[stream,llm]==2.0.0"

Prerequisites:

Python 3.10 or higher

A spaCy language model (recommended):

python -m spacy download en_core_web_lg  # Recommended
# OR for limited resources:
python -m spacy download en_core_web_sm  # Smaller model

Quick Start

from allyanonimiser import create_allyanonimiser

# Create the Allyanonimiser instance with default settings
ally = create_allyanonimiser()

# Analyze text
results = ally.analyze(
    text="Please reference your policy AU-12345678 for claims related to your vehicle rego XYZ123."
)

# Print results
for result in results:
    print(f"Entity: {result.entity_type}, Text: {result.text}, Score: {result.score}")

# Anonymize text
anonymized = ally.anonymize(
    text="Please reference your policy AU-12345678 for claims related to your vehicle rego XYZ123.",
    operators={
        "POLICY_NUMBER": "mask",  # Replace with asterisks
        "VEHICLE_REGISTRATION": "replace"  # Replace with <VEHICLE_REGISTRATION>
    }
)

print(anonymized["text"])
# Output: "Please reference your policy ********** for claims related to your vehicle rego <VEHICLE_REGISTRATION>."

New in Version 2.0.0: Comprehensive Reporting System

Allyanonimiser now includes a comprehensive reporting system that allows you to track, analyze, and visualize anonymization activities.

from allyanonimiser import create_allyanonimiser

# Create instance
ally = create_allyanonimiser()

# Start a new report session
ally.start_new_report("my_session")

# Process multiple texts
texts = [
    "Customer John Smith (DOB: 15/06/1980) called about claim CL-12345.",
    "Jane Doe (email: jane.doe@example.com) requested policy information.",
    "Claims assessor reviewed case for Robert Johnson (ID: 987654321)."
]

for i, text in enumerate(texts):
    ally.anonymize(
        text=text,
        operators={
            "PERSON": "replace",
            "EMAIL_ADDRESS": "mask",
            "DATE_OF_BIRTH": "age_bracket"
        },
        document_id=f"doc_{i+1}"
    )

# Get report summary
report = ally.get_report()
summary = report.get_summary()

print(f"Total documents processed: {summary['total_documents']}")
print(f"Total entities detected: {summary['total_entities']}")
print(f"Entities per document: {summary['entities_per_document']:.2f}")
print(f"Anonymization rate: {summary['anonymization_rate']*100:.2f}%")
print(f"Average processing time: {summary['avg_processing_time']*1000:.2f} ms")

# Export report to different formats
report.export_report("report.html", "html")  # Rich HTML visualization
report.export_report("report.json", "json")  # Detailed JSON data
report.export_report("report.csv", "csv")    # CSV statistics

# In Jupyter notebooks, display rich visualizations
# ally.display_report_in_notebook()

Features

Australian-Focused PII Detection: Specialized patterns for TFNs, Medicare numbers, vehicle registrations, addresses, and more
Insurance Industry Specialization: Detect policy numbers, claim references, and other industry-specific identifiers
Multiple Entity Types: Comprehensive detection of general and specialized PII
Flexible Anonymization: Multiple anonymization operators (replace, mask, redact, hash, and more)
Stream Processing: Memory-efficient processing of large files with chunking support
Reporting System: Comprehensive tracking and visualization of anonymization activities
Jupyter Integration: Rich visualization capabilities in notebook environments
DataFrame Support: Process pandas DataFrames with batch processing and multi-processing support
Configuration Export: Share settings between environments with export/import functionality
Pattern Generation: Create patterns from examples with various generalization levels
Customizable: Extend with your own patterns and entity types

Usage Examples

Analyze Text for PII Entities

from allyanonimiser import create_allyanonimiser

ally = create_allyanonimiser()

# Analyze text
results = ally.analyze(
    text="Customer John Smith (TFN: 123 456 789) reported an incident on 15/06/2023 at his residence in Sydney NSW 2000."
)

# Print detected entities
for result in results:
    print(f"Entity: {result.entity_type}, Text: {result.text}, Score: {result.score}")

Anonymize Text with Different Operators

from allyanonimiser import create_allyanonimiser, AnonymizationConfig

ally = create_allyanonimiser()

# Using configuration object
config = AnonymizationConfig(
    operators={
        "PERSON": "replace",           # Replace with <PERSON>
        "AU_TFN": "hash",              # Replace with SHA-256 hash
        "DATE": "redact",              # Replace with [REDACTED]
        "AU_ADDRESS": "mask",          # Replace with *****
        "DATE_OF_BIRTH": "age_bracket" # Replace with age bracket (e.g., <40-45>)
    },
    age_bracket_size=5  # Size of age brackets
)

# Anonymize text
anonymized = ally.anonymize(
    text="Customer John Smith (TFN: 123 456 789) reported an incident on 15/06/2023. He was born on 05/08/1982 and lives at 42 Main St, Sydney NSW 2000.",
    config=config
)

print(anonymized["text"])

Process Text with Analysis and Anonymization

from allyanonimiser import create_allyanonimiser, AnalysisConfig, AnonymizationConfig

ally = create_allyanonimiser()

# Configure analysis
analysis_config = AnalysisConfig(
    active_entity_types=["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "DATE_OF_BIRTH"],
    min_score_threshold=0.7
)

# Configure anonymization
anonymization_config = AnonymizationConfig(
    operators={
        "PERSON": "replace",
        "EMAIL_ADDRESS": "mask",
        "PHONE_NUMBER": "redact",
        "DATE_OF_BIRTH": "age_bracket"
    }
)

# Process text (analyze + anonymize)
result = ally.process(
    text="Customer Jane Doe (jane.doe@example.com) called on 0412-345-678 regarding her DOB: 22/05/1990.",
    analysis_config=analysis_config,
    anonymization_config=anonymization_config
)

# Access different parts of the result
print("Anonymized text:")
print(result["anonymized"])

print("\nDetected entities:")
for entity in result["analysis"]["entities"]:
    print(f"{entity['entity_type']}: {entity['text']} (score: {entity['score']:.2f})")

print("\nPII-rich segments:")
for segment in result["segments"]:
    print(f"Original: {segment['text']}")
    print(f"Anonymized: {segment['anonymized']}")

Working with DataFrames

import pandas as pd
from allyanonimiser import create_allyanonimiser

# Create DataFrame
df = pd.DataFrame({
    "id": [1, 2, 3],
    "notes": [
        "Customer John Smith (DOB: 15/6/1980) called about policy POL-123456.",
        "Jane Doe (email: jane.doe@example.com) requested a refund.",
        "Alex Johnson from Sydney NSW 2000 reported an incident."
    ]
})

# Create Allyanonimiser
ally = create_allyanonimiser()

# Anonymize a specific column
anonymized_df = ally.process_dataframe(
    df, 
    column="notes", 
    operation="anonymize",
    output_column="anonymized_notes",  # New column for anonymized text
    operators={
        "PERSON": "replace",
        "EMAIL_ADDRESS": "mask",
        "PHONE_NUMBER": "redact"
    }
)

# Display result
print(anonymized_df[["id", "notes", "anonymized_notes"]])

Generating Reports

from allyanonimiser import create_allyanonimiser
import os

# Create output directory
os.makedirs("output", exist_ok=True)

# Create an Allyanonimiser instance
ally = create_allyanonimiser()

# Start a new report session
ally.start_new_report(session_id="example_session")

# Configure anonymization
anonymization_config = {
    "operators": {
        "PERSON": "replace",
        "EMAIL_ADDRESS": "mask",
        "PHONE_NUMBER": "redact",
        "AU_ADDRESS": "replace",
        "DATE_OF_BIRTH": "age_bracket",
        "AU_TFN": "hash",
        "AU_MEDICARE": "mask"
    },
    "age_bracket_size": 10
}

# Process a batch of files
result = ally.process_files(
    file_paths=["data/sample1.txt", "data/sample2.txt", "data/sample3.txt"],
    output_dir="output/anonymized",
    anonymize=True,
    operators=anonymization_config["operators"],
    report=True,
    report_output="output/report.html",
    report_format="html"
)

# Display summary
print(f"Processed {result['total_files']} files")
print(f"Detected {result['report']['total_entities']} entities")
print(f"Average processing time: {result['report']['avg_processing_time']*1000:.2f} ms")

In Jupyter Notebooks

from allyanonimiser import create_allyanonimiser
import pandas as pd
import matplotlib.pyplot as plt

# Create an Allyanonimiser instance
ally = create_allyanonimiser()

# Start a report session and process some texts
# ... processing code ...

# Display rich interactive report
ally.display_report_in_notebook()

# Access report data programmatically
report = ally.get_report()
summary = report.get_summary()

# Create custom visualizations
entity_counts = summary['entity_counts']
plt.figure(figsize=(10, 6))
plt.bar(entity_counts.keys(), entity_counts.values())
plt.title('Entity Type Distribution')
plt.xlabel('Entity Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Documentation

For complete documentation, examples, and advanced usage, visit the GitHub repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.3.0

Apr 15, 2026

3.2.0

Apr 15, 2026

3.1.2

Apr 15, 2026

3.1.1 yanked

Apr 15, 2026

Reason this release was yanked:

deprecated code included in release

2.5.0

Aug 14, 2025

2.4.0

Jul 25, 2025

2.3.0

Jul 23, 2025

2.2.0

Jul 21, 2025

2.1.0

Mar 3, 2025

This version

2.0.0

Mar 3, 2025

1.2.0

Mar 3, 2025

1.1.0

Mar 3, 2025

0.3.3

Mar 3, 2025

0.3.2

Feb 28, 2025

0.2.1

Feb 28, 2025

0.2.0

Feb 28, 2025

0.1.9

Feb 28, 2025

0.1.8

Feb 28, 2025

0.1.7

Feb 28, 2025

0.1.6

Feb 28, 2025

0.1.5

Feb 28, 2025

0.1.4

Feb 27, 2025

0.1.3

Feb 27, 2025

0.1.2

Feb 27, 2025

0.1.1

Feb 27, 2025

0.1.0

Feb 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allyanonimiser-2.0.0.tar.gz (177.2 kB view details)

Uploaded Mar 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

allyanonimiser-2.0.0-py3-none-any.whl (204.2 kB view details)

Uploaded Mar 3, 2025 Python 3

File details

Details for the file allyanonimiser-2.0.0.tar.gz.

File metadata

Download URL: allyanonimiser-2.0.0.tar.gz
Upload date: Mar 3, 2025
Size: 177.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for allyanonimiser-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`db2575f30d8467a7bb5a450ad73a9485644c3a7df02422d4879cfe55e73c931b`
MD5	`626894ab8ab02882b73516cea717d7d8`
BLAKE2b-256	`ba9d637d3d7475d9be03f4580337a51995fd991e045e36640f7389e4a947586d`

See more details on using hashes here.

File details

Details for the file allyanonimiser-2.0.0-py3-none-any.whl.

File metadata

Download URL: allyanonimiser-2.0.0-py3-none-any.whl
Upload date: Mar 3, 2025
Size: 204.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for allyanonimiser-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac4c4ada2f11f1ee364d7c2e0897fad14ae6644d439acb0ce343b2865a8929e4`
MD5	`905b13f45cb5c7a3a73b886df5061c4d`
BLAKE2b-256	`7e091b85d829414c2201ff3fe270946748a1ae2c7d2a586b70366d7d4038484f`

See more details on using hashes here.

allyanonimiser 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Allyanonimiser

Installation

Quick Start

New in Version 2.0.0: Comprehensive Reporting System

Features

Usage Examples

Analyze Text for PII Entities

Anonymize Text with Different Operators

Process Text with Analysis and Anonymization

Working with DataFrames

Generating Reports

In Jupyter Notebooks

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes