Skip to main content

A library for scanning Personally Identifiable Information (PII).

Project description

PII Scanner

A Python library designed for text processing using SpaCy and custom regex pattern matching. This library is capable of processing a variety of text data formats, such as lists, plain text, PDFs, JSON, CSV, and XLSX files

Installation

pip install pii_scanner

Usage

import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions

async def run_scan():
    # Start the timer
    start_time = time.time()

    pii_scanner = PIIScanner()
    # file_path = 'dummy-pii/test.json' 
    file_path = 'dummy-pii/test.xlsx' 

    data = ['Ankit Gupta', '+919140562125', 'Indian']
    results_list_data = await pii_scanner.scan(data=, sample_size=0.005, region=Regions.IN)
    # results_file_data = await pii_scanner.scan(file_path=file_path, sample_size=0.005, region=Regions.IN)

    print("Results:", results_list_data, results_list_data)

# Run the asynchronous scan
asyncio.run(run_scan())

Output

[
    {
        "text": "Ankit Gupta",
        "entity_detected": [
            {"type": "PERSON", "start": 0, "end": 11, "score": 0.85}
        ]
    },
    {
        "text": "+919140562195",
        "entity_detected": [
            {"type": "PHONE_NUMBER", "start": 0, "end": 13, "score": 0.85}
        ]
    },
    {
        "text": "Indian",
        "entity_detected": [
            {"type": "NATIONALITY", "start": 0, "end": 6, "score": 0.9}
        ]
    }
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_Scanner-0.1.15-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file pii_Scanner-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: pii_Scanner-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_Scanner-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 9c05dffc05ca76a77dee7cdf78a1ffd22a4057dacf13968ece25d49174de3bce
MD5 ea62cfcf387201742e82eb80d87adb00
BLAKE2b-256 f2fa2eb048214acd2f1343cccecb7d459ba602d2996959cd8a76c18a279b203c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page