The iFood Databricks Alcatraz library anonymizes PII data in a Spark DataFrame. It is designed to be used in Databricks notebooks and jobs.
Project description
iFood Databricks Alcatraz
The iFood Databricks Alcatraz library anonymizes PII data in a Spark DataFrame. It is designed to be used in Databricks notebooks and jobs.
Example Usage
To get more examples, check the examples
folder.
from pyspark.sql import SparkSession
from ifood_databricks_alcatraz import IFoodAnonymizer, Entities
# We allocate 4GB of memory to the Spark driver to avoid memory issues with the UDF
# This is a common issue when working with UDFs in Spark locally
spark = SparkSession.builder \
.appName("AppName") \
.config("spark.driver.memory", "4g") \
.getOrCreate()
# Data to be included in the DataFrame
data = [
(
"John Doe",
40,
"São Carlos",
"My phone is 19 967288744",
"My cpf is 831.756.690-00",
"My ip is 200.122.22.1",
"My email address is john@ee.com",
),
(
"André Osti",
30,
"Campinas",
"My phone is 21 88367-8333",
"My cpf is 831.756.690-01",
"My address 10.22.22.1",
"My address is andre.doe@gmail.com",
),
]
columns = ["Name", "Age", "City", "Phone", "CPF", "IP_ADDRESS", "Email"]
df = spark.createDataFrame(data, columns)
df.show(truncate=False)
anonymizer = IFoodAnonymizer()
entities = [Entities.PHONE_NUMBER]
# Apply the anonymization UDF to the 'Name' column
anonymized_df = anonymizer.anonymize_column(df, "Phone", entities=entities)
anonymized_df = anonymizer.anonymize_column(anonymized_df, "IP_ADDRESS", entities=[Entities.IP_ADDRESS])
anonymized_df = anonymizer.anonymize_column(anonymized_df, "CPF", entities=[Entities.CPF])
anonymized_df = anonymizer.anonymize_column(anonymized_df, "Email", entities=[Entities.EMAIL_ADDRESS])
# Show the results
anonymized_df.show(truncate=False)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ifood_databricks_alcatraz-0.1.29.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8dc020b38654350deec502cf74c734da17ed6c35b3ea920021517543523386a9 |
|
MD5 | 3abc282381ae885a6e7e6e33d707cdff |
|
BLAKE2b-256 | 1cfa02ebc489dcca1227c9a21dc307157ce8a0010b3029788daccf8fe045bbe8 |
Close
Hashes for ifood_databricks_alcatraz-0.1.29-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc990d2b22c44c3bf9b4c6b2ae1b497b0d9829890b1c00acbcf7bdbd2b27edca |
|
MD5 | a6f34671d39a4c0eeb80095e06bde1b0 |
|
BLAKE2b-256 | 4645f3e946d2cd2fe00b2f09f77bc904ee598c2bc056e4b0b1d609b54cb14931 |