Skip to main content

A library for scanning Personally Identifiable Information (PII).

Project description

PII Scanner

A Python library designed for text processing using SpaCy and custom regex pattern matching. This library is capable of processing a variety of text data formats, such as lists, plain text, PDFs, JSON, CSV, and XLSX files

Installation

pip install pii_scanner

Usage

import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions

async def run_scan():
    # Start the timer
    start_time = time.time()

    pii_scanner = PIIScanner()
    # file_path = 'dummy-pii/test.json' 
    file_path = 'dummy-pii/test.xlsx' 

    data = ['Ankit Gupta', '+919140562125', 'Indian']
    results_list_data = await pii_scanner.scan(data=, sample_size=0.005, region=Regions.IN)
    # results_file_data = await pii_scanner.scan(file_path=file_path, sample_size=0.005, region=Regions.IN)

    print("Results:", results_list_data, results_list_data)

# Run the asynchronous scan
asyncio.run(run_scan())

Output

[
    {
        "text": "Ankit Gupta",
        "entity_detected": [
            {"type": "PERSON", "start": 0, "end": 11, "score": 0.85}
        ]
    },
    {
        "text": "+919140562195",
        "entity_detected": [
            {"type": "PHONE_NUMBER", "start": 0, "end": 13, "score": 0.85}
        ]
    },
    {
        "text": "Indian",
        "entity_detected": [
            {"type": "NATIONALITY", "start": 0, "end": 6, "score": 0.9}
        ]
    }
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_scanner-0.1.22.tar.gz (162.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_Scanner-0.1.22-py3-none-any.whl (173.2 kB view details)

Uploaded Python 3

File details

Details for the file pii_scanner-0.1.22.tar.gz.

File metadata

  • Download URL: pii_scanner-0.1.22.tar.gz
  • Upload date:
  • Size: 162.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_scanner-0.1.22.tar.gz
Algorithm Hash digest
SHA256 113e569b023b4f2e3131aa17978433b180a533eba853e9fdddafa10d91f05800
MD5 288928ca5183918527686cf2206da1ad
BLAKE2b-256 3acd0ed1bcd54483940057e28e0c696aa2e6f8012225902a0ceded45077ab4d2

See more details on using hashes here.

File details

Details for the file pii_Scanner-0.1.22-py3-none-any.whl.

File metadata

  • Download URL: pii_Scanner-0.1.22-py3-none-any.whl
  • Upload date:
  • Size: 173.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_Scanner-0.1.22-py3-none-any.whl
Algorithm Hash digest
SHA256 7d2f441e9b4c1c0eb0d083cf04e539db5bd7d09255eb2d68e5ea0c83dc1258b9
MD5 73e76f4544dc2a25fdc82b7c60b3bea6
BLAKE2b-256 e303b0bf4dc3f66b9f59f1d63a86940fdbefa89671f80b9f5c328cab764e7d60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page