Skip to main content

A library for scanning Personally Identifiable Information (PII).

Project description

PII Scanner

A Python library designed for text processing using SpaCy and custom regex pattern matching. This library is capable of processing a variety of text data formats, such as lists, plain text, PDFs, JSON, CSV, and XLSX files

Installation

pip install pii_scanner

Usage

import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions

async def run_scan():
    # Start the timer
    start_time = time.time()

    pii_scanner = PIIScanner()
    # file_path = 'dummy-pii/test.json' 
    file_path = 'dummy-pii/test.xlsx' 

    data = ['Ankit Gupta', '+919140562125', 'Indian']
    results_list_data = await pii_scanner.scan(data=, sample_size=0.005, region=Regions.IN)
    # results_file_data = await pii_scanner.scan(file_path=file_path, sample_size=0.005, region=Regions.IN)

    print("Results:", results_list_data, results_list_data)

# Run the asynchronous scan
asyncio.run(run_scan())

Output

[
    {
        "text": "Ankit Gupta",
        "entity_detected": [
            {"type": "PERSON", "start": 0, "end": 11, "score": 0.85}
        ]
    },
    {
        "text": "+919140562195",
        "entity_detected": [
            {"type": "PHONE_NUMBER", "start": 0, "end": 13, "score": 0.85}
        ]
    },
    {
        "text": "Indian",
        "entity_detected": [
            {"type": "NATIONALITY", "start": 0, "end": 6, "score": 0.9}
        ]
    }
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_scanner-0.1.19.tar.gz (161.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_Scanner-0.1.19-py3-none-any.whl (171.9 kB view details)

Uploaded Python 3

File details

Details for the file pii_scanner-0.1.19.tar.gz.

File metadata

  • Download URL: pii_scanner-0.1.19.tar.gz
  • Upload date:
  • Size: 161.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_scanner-0.1.19.tar.gz
Algorithm Hash digest
SHA256 b0d06e6ec449383fa21d9e9c6da1c18d54fe7a03506768abe82c408d33d79496
MD5 76dc98c2b472118d6456f203d82decdc
BLAKE2b-256 ef0fdaf4ec262ae23d21ba8659234f4ff904fabd0dd59805fced6ab9110ebdd6

See more details on using hashes here.

File details

Details for the file pii_Scanner-0.1.19-py3-none-any.whl.

File metadata

  • Download URL: pii_Scanner-0.1.19-py3-none-any.whl
  • Upload date:
  • Size: 171.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_Scanner-0.1.19-py3-none-any.whl
Algorithm Hash digest
SHA256 2627d8376d01654706f797b5e7da962b28d01552233dd03395dceb42b62ed040
MD5 448a56826aabc5f246e4746470fb8f26
BLAKE2b-256 c0eb60e19678324f331061d43f507334a801725f523c3247e6255bf955ef7fff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page