Skip to main content

Presidio structured package - analyzes and anonymizes structured and semi-structured data.

Project description

Presidio structured

Description

The Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.

Installation

As a python package

To install the presidio-structured package, run the following command:

pip install presidio-structured

Getting started

Anonymizing Data Frames:

import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker # optionally using faker as an example

# Initialize the engine with a Pandas data processor (default)
pandas_engine = StructuredEngine()

# Create a sample DataFrame
sample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})

# Generate a tabular analysis which describes PII entities in the DataFrame.
tabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)

# Define anonymization operators
fake = Faker()
operators = {
    "PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
    "EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}

# Anonymize DataFrame
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)

More information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

presidio_structured-0.0.6-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file presidio_structured-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for presidio_structured-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f3454c86857a00db9828e684895da43411bcc7d750cac0a52e15d68f6c6455a1
MD5 932e56c267f3cd04a41ed9b73a7be337
BLAKE2b-256 ff5bc2c50b045a99ae0483fb9a1ef11e90ddc94ee0f5e59829077b40a675496e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page