Skip to main content

A package for hashing personal identifiable information (PII).

Project description

ByeByePii

PyPI Latest Release Code style: black

What is it?

ByeByePii is a Python package that is meant for hashing personal identifiable information (PII). It was built focused on making Data Lakes storing JSON files GDPR compliant.

Main Features

  • Analyzing Python Dictionaries in order to identify PII
  • Hashing PII in a given Python Dictionary

Where to get it

The source code is currently hosted on GitHub at: https://github.com/falkzeh/ByeByePii

Binary installers for the latest released version are available at the Python Package Index (PyPI).

pip install ByeByePii

Documentation

Analyzing a Python Dictionary and creating a list of keys to hash

In order to not having to manually look for all the keys in a Python Dictionary, we can use the analyzeDict function.

import byebyepii
import json

if __name__ == '__main__':

    # Loading local JSON file
    with open('data.json') as json_file:
        data = json.load(json_file)

    # Analyzing the dictionary and creating our hash list
    key_list, subkey_list = byebyepii.analyzeDict(data)
$ python3 analyzeDict.py

Add BuyerInfo - BuyerEmail to hash list? (y/n) y
Add SalesChannel to hash list? (y/n) n
Add OrderStatus to hash list? (y/n) n
Add PurchaseDate to hash list? (y/n) n
Add ShippingAddress - StateOrRegion to hash list? (y/n) y
Add ShippingAddress - PostalCode to hash list? (y/n) y
Add ShippingAddress - City to hash list? (y/n) n
Add ShippingAddress - CountryCode to hash list? (y/n) n
Add LastUpdateDate to hash list? (y/n) n

Keys to hash: ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
Subkeys to hash: ['BuyerEmail', 'StateOrRegion', 'PostalCode']

Hashing PII in a given Python Dictionary

Using the key lists we just created we can proceed to hash the PII in the dictionary.

import byebyepii
import json

if __name__ == '__main__':

    # Loading local JSON file
    with open('data.json') as json_file:
        data = json.load(json_file)

    # Hasing the PII
    keys_to_hash = ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
    subkeys_to_hash = ['BuyerEmail', 'StateOrRegion', 'PostalCode']
    hashed_pii = byebyepii.hashPii(data, keys_to_hash, subkeys_to_hash)

    # Writing the updated JSON file
    with open('hashed_data.json', 'w') as outfile:
        json.dump(hashed_pii, outfile)

Before:

{
  "BuyerInfo": {
    "BuyerEmail": "test@test.com"
  },
  "EarliestShipDate": "2022-01-01T23:59:59Z",
  "SalesChannel": "Website",
  "OrderStatus": "Shipped",
  "PurchaseDate": "2022-01-01T23:59:59Z",
  "ShippingAddress": {
    "StateOrRegion": "West Midlands",
    "PostalCode": "DY9 0TH",
    "City": "STOURBRIDGE",
    "CountryCode": "GB"
  },
  "LastUpdateDate": "2022-01-01T23:59:59Z",
}

After:

{
  "BuyerInfo": {
    "BuyerEmail": "037a51cb9162f51772eaf6b0fb02e1b5d0bf8219deacf723eeedc162209bfd33"
  },
  "EarliestShipDate": "2022-01-01T23:59:59Z",
  "SalesChannel": "Website",
  "OrderStatus": "Shipped",
  "PurchaseDate": "2022-01-01T23:59:59Z",
  "ShippingAddress": {
    "StateOrRegion": "08fa57d00de1936ebea7aeaf8e36d04510a5d885cfaa4f169c2b010d36ccaca4",
    "PostalCode": "714f02c01e20988ee273776dc218f44326c2f5839618b0c117413b0cc7d91701",
    "City": "STOURBRIDGE",
    "CountryCode": "GB"
  },
  "LastUpdateDate": "2022-01-01T23:59:59Z",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ByeByePii-1.0.1.tar.gz (3.4 kB view hashes)

Uploaded Source

Built Distribution

ByeByePii-1.0.1-py3-none-any.whl (3.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page