Presidio structured package - analyzes and anonymizes structured and semi-structured data.
Project description
Presidio structured
Status
Alpha: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.
Description
The Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.
Installation
As a python package
To install the presidio-structured
package, run the following command:
pip install presidio-structured
Getting started
Anonymizing Data Frames:
import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker # optionally using faker as an example
# Initialize the engine with a Pandas data processor (default)
pandas_engine = StructuredEngine()
# Create a sample DataFrame
sample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})
# Generate a tabular analysis which describes PII entities in the DataFrame.
tabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)
# Define anonymization operators
fake = Faker()
operators = {
"PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
"EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}
# Anonymize DataFrame
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)
More information
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for presidio_structured-0.0.1a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3735aaf7e9a1316ec09783e43c2a057b43ade3988f4f32f5189b5a2ea3110beb |
|
MD5 | bfcc1fd582597bc1a7ac3d0b1391b407 |
|
BLAKE2b-256 | c97bd0b48db7503bf6d26347f6614427b5f4c2d1aaea01dda8f573c7f112dc4f |