Skip to main content

A library for scanning Personally Identifiable Information (PII).

Project description

PII Scanner

A Python library designed for text processing using SpaCy and custom regex pattern matching. This library is capable of processing a variety of text data formats, such as lists, plain text, PDFs, JSON, CSV, and XLSX files

Installation

pip install pii_scanner

Usage

import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions

async def run_scan():
    # Start the timer
    start_time = time.time()

    pii_scanner = PIIScanner()
    # file_path = 'dummy-pii/test.json' 
    file_path = 'dummy-pii/test.xlsx' 

    data = ['Ankit Gupta', '+919140562125', 'Indian']
    results_list_data = await pii_scanner.scan(data=, sample_size=0.005, region=Regions.IN)
    # results_file_data = await pii_scanner.scan(file_path=file_path, sample_size=0.005, region=Regions.IN)

    print("Results:", results_list_data, results_list_data)

# Run the asynchronous scan
asyncio.run(run_scan())

Output

[
    {
        "text": "Ankit Gupta",
        "entity_detected": [
            {"type": "PERSON", "start": 0, "end": 11, "score": 0.85}
        ]
    },
    {
        "text": "+919140562195",
        "entity_detected": [
            {"type": "PHONE_NUMBER", "start": 0, "end": 13, "score": 0.85}
        ]
    },
    {
        "text": "Indian",
        "entity_detected": [
            {"type": "NATIONALITY", "start": 0, "end": 6, "score": 0.9}
        ]
    }
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_scanner-0.1.21.tar.gz (161.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_Scanner-0.1.21-py3-none-any.whl (171.9 kB view details)

Uploaded Python 3

File details

Details for the file pii_scanner-0.1.21.tar.gz.

File metadata

  • Download URL: pii_scanner-0.1.21.tar.gz
  • Upload date:
  • Size: 161.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_scanner-0.1.21.tar.gz
Algorithm Hash digest
SHA256 e7edf0abb2b87d553d989e4ecd1a7374f1d0047880450852651acf7431d7c91d
MD5 048d101d6ec07d6bb8fde9712d909f21
BLAKE2b-256 1a3f6cbbc01dff0030d5f15ac75c53114faa1ac6ab175aacb637e6b957f20feb

See more details on using hashes here.

File details

Details for the file pii_Scanner-0.1.21-py3-none-any.whl.

File metadata

  • Download URL: pii_Scanner-0.1.21-py3-none-any.whl
  • Upload date:
  • Size: 171.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_Scanner-0.1.21-py3-none-any.whl
Algorithm Hash digest
SHA256 67102bda4097b77fe62085b83ca1c0d8e196f42d5f1c3148395d7a26115e31eb
MD5 a0e86987891a327f6fe31e054087717b
BLAKE2b-256 707764849867fb277d439a784638d464fdccc6cdfaacdaae3d8edd5af0f0a1d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page