Skip to main content

The iFood Databricks Alcatraz library anonymizes PII data in a Spark DataFrame. It is designed to be used in Databricks notebooks and jobs.

Project description

iFood Databricks Alcatraz

The iFood Databricks Alcatraz library anonymizes PII data in a Spark DataFrame. It is designed to be used in Databricks notebooks and jobs.

Example Usage

To get more examples, check the examples folder.

from pyspark.sql import SparkSession

from ifood_databricks_alcatraz import IFoodAnonymizer, Entities

# We allocate 4GB of memory to the Spark driver to avoid memory issues with the UDF
# This is a common issue when working with UDFs in Spark locally
spark = SparkSession.builder \
    .appName("AppName") \
    .config("spark.driver.memory", "4g") \
    .getOrCreate()


# Data to be included in the DataFrame
data = [
    (
        "John Doe",
        40,
        "São Carlos",
        "My phone is 19 967288744",
        "My cpf is 831.756.690-00",
        "My ip is 200.122.22.1",
        "My email address is john@ee.com",
    ),
    (
        "André Osti",
        30,
        "Campinas",
        "My phone is 21 88367-8333",
        "My cpf is 831.756.690-01",
        "My address 10.22.22.1",
        "My address is andre.doe@gmail.com",
    ),
]

columns = ["Name", "Age", "City", "Phone", "CPF", "IP_ADDRESS", "Email"]

df = spark.createDataFrame(data, columns)

df.show(truncate=False)

anonymizer = IFoodAnonymizer()

entities = [Entities.PHONE_NUMBER]

# Apply the anonymization UDF to the 'Name' column
anonymized_df = anonymizer.anonymize_column(df, "Phone", entities=entities)
anonymized_df = anonymizer.anonymize_column(anonymized_df, "IP_ADDRESS", entities=[Entities.IP_ADDRESS])
anonymized_df = anonymizer.anonymize_column(anonymized_df, "CPF", entities=[Entities.CPF])
anonymized_df = anonymizer.anonymize_column(anonymized_df, "Email", entities=[Entities.EMAIL_ADDRESS])

# Show the results
anonymized_df.show(truncate=False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ifood_databricks_alcatraz-0.1.29.tar.gz (192.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ifood_databricks_alcatraz-0.1.29-py3-none-any.whl (193.8 kB view details)

Uploaded Python 3

File details

Details for the file ifood_databricks_alcatraz-0.1.29.tar.gz.

File metadata

  • Download URL: ifood_databricks_alcatraz-0.1.29.tar.gz
  • Upload date:
  • Size: 192.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.8.0-40-generic

File hashes

Hashes for ifood_databricks_alcatraz-0.1.29.tar.gz
Algorithm Hash digest
SHA256 8dc020b38654350deec502cf74c734da17ed6c35b3ea920021517543523386a9
MD5 3abc282381ae885a6e7e6e33d707cdff
BLAKE2b-256 1cfa02ebc489dcca1227c9a21dc307157ce8a0010b3029788daccf8fe045bbe8

See more details on using hashes here.

File details

Details for the file ifood_databricks_alcatraz-0.1.29-py3-none-any.whl.

File metadata

File hashes

Hashes for ifood_databricks_alcatraz-0.1.29-py3-none-any.whl
Algorithm Hash digest
SHA256 fc990d2b22c44c3bf9b4c6b2ae1b497b0d9829890b1c00acbcf7bdbd2b27edca
MD5 a6f34671d39a4c0eeb80095e06bde1b0
BLAKE2b-256 4645f3e946d2cd2fe00b2f09f77bc904ee598c2bc056e4b0b1d609b54cb14931

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page