Skip to main content

A package for hashing personal identifiable information (PII).

Project description

ByeByePii

PyPI Latest Release Code style: black

What is it?

ByeByePii is a Python package that is meant for hashing personal identifiable information (PII). It was built focused on making Data Lakes storing JSON files GDPR compliant.

Main Features

  • Analyzing Python Dictionaries in order to identify PII
  • Hashing PII in a given Python Dictionary

Where to get it

The source code is currently hosted on GitHub at: https://github.com/falkzeh/ByeByePii

Binary installers for the latest released version are available at the Python Package Index (PyPI).

pip install ByeByePii

Documentation

Analyzing a Python Dictionary and creating a list of keys to hash

In order to not having to manually look for all the keys in a Python Dictionary, we can use the analyzeDict function.

import byebyepii
import json

if __name__ == '__main__':

    # Loading local JSON file
    with open('data.json') as json_file:
        data = json.load(json_file)

    # Analyzing the dictionary and creating our hash list
    key_list, subkey_list = byebyepii.analyzeDict(data)
$ python3 analyzeDict.py

Add BuyerInfo - BuyerEmail to hash list? (y/n) y
Add SalesChannel to hash list? (y/n) n
Add OrderStatus to hash list? (y/n) n
Add PurchaseDate to hash list? (y/n) n
Add ShippingAddress - StateOrRegion to hash list? (y/n) y
Add ShippingAddress - PostalCode to hash list? (y/n) y
Add ShippingAddress - City to hash list? (y/n) n
Add ShippingAddress - CountryCode to hash list? (y/n) n
Add LastUpdateDate to hash list? (y/n) n

Keys to hash: ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
Subkeys to hash: ['BuyerEmail', 'StateOrRegion', 'PostalCode']

Hashing PII in a given Python Dictionary

Using the key lists we just created we can proceed to hash the PII in the dictionary.

import byebyepii
import json

if __name__ == '__main__':

    # Loading local JSON file
    with open('data.json') as json_file:
        data = json.load(json_file)

    # Hasing the PII
    keys_to_hash = ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
    subkeys_to_hash = ['BuyerEmail', 'StateOrRegion', 'PostalCode']
    hashed_pii = byebyepii.hashPii(data, keys_to_hash, subkeys_to_hash)

    # Writing the updated JSON file
    with open('hashed_data.json', 'w') as outfile:
        json.dump(hashed_pii, outfile)

Before:

{
  "BuyerInfo": {
    "BuyerEmail": "test@test.com"
  },
  "EarliestShipDate": "2022-01-01T23:59:59Z",
  "SalesChannel": "Website",
  "OrderStatus": "Shipped",
  "PurchaseDate": "2022-01-01T23:59:59Z",
  "ShippingAddress": {
    "StateOrRegion": "West Midlands",
    "PostalCode": "DY9 0TH",
    "City": "STOURBRIDGE",
    "CountryCode": "GB"
  },
  "LastUpdateDate": "2022-01-01T23:59:59Z",
}

After:

{
  "BuyerInfo": {
    "BuyerEmail": "037a51cb9162f51772eaf6b0fb02e1b5d0bf8219deacf723eeedc162209bfd33"
  },
  "EarliestShipDate": "2022-01-01T23:59:59Z",
  "SalesChannel": "Website",
  "OrderStatus": "Shipped",
  "PurchaseDate": "2022-01-01T23:59:59Z",
  "ShippingAddress": {
    "StateOrRegion": "08fa57d00de1936ebea7aeaf8e36d04510a5d885cfaa4f169c2b010d36ccaca4",
    "PostalCode": "714f02c01e20988ee273776dc218f44326c2f5839618b0c117413b0cc7d91701",
    "City": "STOURBRIDGE",
    "CountryCode": "GB"
  },
  "LastUpdateDate": "2022-01-01T23:59:59Z",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ByeByePii-1.0.1.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

ByeByePii-1.0.1-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file ByeByePii-1.0.1.tar.gz.

File metadata

  • Download URL: ByeByePii-1.0.1.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for ByeByePii-1.0.1.tar.gz
Algorithm Hash digest
SHA256 548f2b140a589d8f15eb12af8694f85b4ae16555b75d6d2e2c7d6ea7b68d20d3
MD5 1b07dddb9f8ce52fe87dc58f6203709d
BLAKE2b-256 f3ebc2697eabd7e79b66471868e55b128725fcd4c80c0faadde7f5cc7ce8c67f

See more details on using hashes here.

File details

Details for the file ByeByePii-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: ByeByePii-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for ByeByePii-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d5ea6b999303e7bc2b96846b2dc23fd40a2f3b86c472a4baa6348852893cfeb
MD5 02cd69cdeb7599adb1427e72b7c91d89
BLAKE2b-256 6b14be71191e81833f35f64ec431e053557fe40395cebf1d43b4d14550d330b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page