A package for hashing personal identifiable information (PII).
Project description
ByeByePii
What is it?
ByeByePii is a Python package that is meant for hashing personal identifiable information (PII). It was built focused on making Data Lakes storing JSON files GDPR compliant.
Main Features
- Analyzing Python Dictionaries in order to identify PII
- Hashing PII in a given Python Dictionary
Where to get it
The source code is currently hosted on GitHub at: https://github.com/falkzeh/ByeByePii
Binary installers for the latest released version are available at the Python Package Index (PyPI).
pip install ByeByePii
Documentation
Analyzing a Python Dictionary and creating a list of keys to hash
In order to not having to manually look for all the keys in a Python Dictionary, we can use the analyzeDict
function.
import byebyepii
import json
if __name__ == '__main__':
# Loading local JSON file
with open('data.json') as json_file:
data = json.load(json_file)
# Analyzing the dictionary and creating our hash list
key_list, subkey_list = byebyepii.analyzeDict(data)
$ python3 analyzeDict.py
Add BuyerInfo - BuyerEmail to hash list? (y/n) y
Add SalesChannel to hash list? (y/n) n
Add OrderStatus to hash list? (y/n) n
Add PurchaseDate to hash list? (y/n) n
Add ShippingAddress - StateOrRegion to hash list? (y/n) y
Add ShippingAddress - PostalCode to hash list? (y/n) y
Add ShippingAddress - City to hash list? (y/n) n
Add ShippingAddress - CountryCode to hash list? (y/n) n
Add LastUpdateDate to hash list? (y/n) n
Keys to hash: ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
Subkeys to hash: ['BuyerEmail', 'StateOrRegion', 'PostalCode']
Hashing PII in a given Python Dictionary
Using the key lists we just created we can proceed to hash the PII in the dictionary.
import byebyepii
import json
if __name__ == '__main__':
# Loading local JSON file
with open('data.json') as json_file:
data = json.load(json_file)
# Hasing the PII
keys_to_hash = ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
subkeys_to_hash = ['BuyerEmail', 'StateOrRegion', 'PostalCode']
hashed_pii = byebyepii.hashPii(data, keys_to_hash, subkeys_to_hash)
# Writing the updated JSON file
with open('hashed_data.json', 'w') as outfile:
json.dump(hashed_pii, outfile)
Before:
{
"BuyerInfo": {
"BuyerEmail": "test@test.com"
},
"EarliestShipDate": "2022-01-01T23:59:59Z",
"SalesChannel": "Website",
"OrderStatus": "Shipped",
"PurchaseDate": "2022-01-01T23:59:59Z",
"ShippingAddress": {
"StateOrRegion": "West Midlands",
"PostalCode": "DY9 0TH",
"City": "STOURBRIDGE",
"CountryCode": "GB"
},
"LastUpdateDate": "2022-01-01T23:59:59Z",
}
After:
{
"BuyerInfo": {
"BuyerEmail": "037a51cb9162f51772eaf6b0fb02e1b5d0bf8219deacf723eeedc162209bfd33"
},
"EarliestShipDate": "2022-01-01T23:59:59Z",
"SalesChannel": "Website",
"OrderStatus": "Shipped",
"PurchaseDate": "2022-01-01T23:59:59Z",
"ShippingAddress": {
"StateOrRegion": "08fa57d00de1936ebea7aeaf8e36d04510a5d885cfaa4f169c2b010d36ccaca4",
"PostalCode": "714f02c01e20988ee273776dc218f44326c2f5839618b0c117413b0cc7d91701",
"City": "STOURBRIDGE",
"CountryCode": "GB"
},
"LastUpdateDate": "2022-01-01T23:59:59Z",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ByeByePii-1.0.1.tar.gz
.
File metadata
- Download URL: ByeByePii-1.0.1.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 548f2b140a589d8f15eb12af8694f85b4ae16555b75d6d2e2c7d6ea7b68d20d3 |
|
MD5 | 1b07dddb9f8ce52fe87dc58f6203709d |
|
BLAKE2b-256 | f3ebc2697eabd7e79b66471868e55b128725fcd4c80c0faadde7f5cc7ce8c67f |
File details
Details for the file ByeByePii-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: ByeByePii-1.0.1-py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d5ea6b999303e7bc2b96846b2dc23fd40a2f3b86c472a4baa6348852893cfeb |
|
MD5 | 02cd69cdeb7599adb1427e72b7c91d89 |
|
BLAKE2b-256 | 6b14be71191e81833f35f64ec431e053557fe40395cebf1d43b4d14550d330b5 |