Skip to main content

A library for scanning Personally Identifiable Information (PII).

Project description

PII Scanner

A Python library designed for text processing using SpaCy and custom regex pattern matching. This library is capable of processing a variety of text data formats, such as lists, plain text, PDFs, JSON, CSV, and XLSX files

Installation

pip install pii_scanner

Usage

import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions

async def run_scan():
    # Start the timer
    start_time = time.time()

    pii_scanner = PIIScanner()
    # file_path = 'dummy-pii/test.json' 
    file_path = 'dummy-pii/test.xlsx' 

    data = ['Ankit Gupta', '+919140562125', 'Indian']
    results_list_data = await pii_scanner.scan(data=, sample_size=0.005, region=Regions.IN)
    # results_file_data = await pii_scanner.scan(file_path=file_path, sample_size=0.005, region=Regions.IN)

    print("Results:", results_list_data, results_list_data)

# Run the asynchronous scan
asyncio.run(run_scan())

Output

[
    {
        "text": "Ankit Gupta",
        "entity_detected": [
            {"type": "PERSON", "start": 0, "end": 11, "score": 0.85}
        ]
    },
    {
        "text": "+919140562195",
        "entity_detected": [
            {"type": "PHONE_NUMBER", "start": 0, "end": 13, "score": 0.85}
        ]
    },
    {
        "text": "Indian",
        "entity_detected": [
            {"type": "NATIONALITY", "start": 0, "end": 6, "score": 0.9}
        ]
    }
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_scanner-0.1.18.tar.gz (161.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_Scanner-0.1.18-py3-none-any.whl (172.3 kB view details)

Uploaded Python 3

File details

Details for the file pii_scanner-0.1.18.tar.gz.

File metadata

  • Download URL: pii_scanner-0.1.18.tar.gz
  • Upload date:
  • Size: 161.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_scanner-0.1.18.tar.gz
Algorithm Hash digest
SHA256 9eddeee269b62f60878794c260acbabcbb5703511b6a24c165bcb06032fcfb32
MD5 1f235fec598d737bb1f2a3c6eda7d2e1
BLAKE2b-256 00c5ae317c4055e611a05af78f70a81649442793aa13113b741f9de7ceed5ca9

See more details on using hashes here.

File details

Details for the file pii_Scanner-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: pii_Scanner-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 172.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_Scanner-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 8697addd94b2375a4ceb07a24100d4beb0ab22dcf4b3de926344c2bba8671407
MD5 fdbce6ca647eecd1ec46843b70072d29
BLAKE2b-256 b2c89b5f6fb9d092b28f62005c1e366a696ecb0494369526da56c44379fe5f0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page