Skip to main content

A library for scanning Personally Identifiable Information (PII).

Project description

PII Scanner

A Python library designed for text processing using SpaCy and custom regex pattern matching. This library is capable of processing a variety of text data formats, such as lists, plain text, PDFs, JSON, CSV, and XLSX files

Installation

pip install pii_scanner

Usage

import asyncio
from pii_scanner.scanner import PIIScanner
from pii_scanner.constants.patterns_countries import Regions

async def run_scan():
    # Start the timer
    start_time = time.time()

    pii_scanner = PIIScanner()
    # file_path = 'dummy-pii/test.json' 
    file_path = 'dummy-pii/test.xlsx' 

    data = ['Ankit Gupta', '+919140562125', 'Indian']
    results_list_data = await pii_scanner.scan(data=, sample_size=0.005, region=Regions.IN)
    # results_file_data = await pii_scanner.scan(file_path=file_path, sample_size=0.005, region=Regions.IN)

    print("Results:", results_list_data, results_list_data)

# Run the asynchronous scan
asyncio.run(run_scan())

Output

[
    {
        "text": "Ankit Gupta",
        "entity_detected": [
            {"type": "PERSON", "start": 0, "end": 11, "score": 0.85}
        ]
    },
    {
        "text": "+919140562195",
        "entity_detected": [
            {"type": "PHONE_NUMBER", "start": 0, "end": 13, "score": 0.85}
        ]
    },
    {
        "text": "Indian",
        "entity_detected": [
            {"type": "NATIONALITY", "start": 0, "end": 6, "score": 0.9}
        ]
    }
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_scanner-0.1.17.tar.gz (161.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_Scanner-0.1.17-py3-none-any.whl (172.1 kB view details)

Uploaded Python 3

File details

Details for the file pii_scanner-0.1.17.tar.gz.

File metadata

  • Download URL: pii_scanner-0.1.17.tar.gz
  • Upload date:
  • Size: 161.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_scanner-0.1.17.tar.gz
Algorithm Hash digest
SHA256 37f7e6105219bbce697fd07b486359a72f4dcd01dde8dfac534d458e67f113c9
MD5 bd3851312bd33eee383c8a5c3bea3b0f
BLAKE2b-256 73ba6d6ae7c6de991e57c2991cbe4105ab65cd8ed2b19dd5d95c2254472996fd

See more details on using hashes here.

File details

Details for the file pii_Scanner-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: pii_Scanner-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 172.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for pii_Scanner-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 673f81574e62b954b4b2abd597865b8bf66a9c12d10135f02da364331c821c53
MD5 1f73c3d4f5498c23c5e3c60e732ff14b
BLAKE2b-256 c09bb043ff2a48b8940e1beeb04935a7b4401f957991b3084453b4975bca0c04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page