Skip to main content

This is the main class that will generate masked data for light weight use cases.

Project description

Masking data in Python.

Logo.jpeg

About this package

This newly created light weight package will invoke the python class that can mask certain categories of data by invoking the right methods. However, you can get back to original data once you mask it. So, you need to use it where you want to test things without any need for getting back to the original data. This application developed using pandas & other useful libraries. This project is for the advanced Python developer & Data Science Newbi's.

How to use this package

(The following instructions apply to Posix/bash. Windows users should check here.)

First, clone this repository and open a terminal inside the root folder.

Create and activate a new virtual environment (recommended) by running the following:

python3 -m venv venv
source venv/bin/activate

Install the requirements:

pip install -r requirements.txt

Run the Augmented Reality-App:

python playPII.py

Make sure that you are properly connected with a functional WebCam or scanned images (Preferably a separate external WebCAM).

Please find the dependent package -

numpy==1.24.2
pandas==1.5.3
python-dateutil==2.8.2
FastDataMask==0.0.6

How to use this Package

We need to understand that the current class has some basic limitations. We need to create some wrapper methods to invoke the right masking classifications.

Let's look into them -

charList = ccl.clsCircularList()

def mask_email(email):
    try:
        maskedEmail = charList.maskEmail(email)
        return maskedEmail
    except:
        return ''

def mask_phone(phone):
    try:
        maskedPhone = charList.maskPhone(phone)
        return maskedPhone
    except:
        return ''

def mask_name(flname):
    try:
        maskedFLName = charList.maskFLName(flname)
        return maskedFLName
    except:
        return ''

def mask_date(dt):
    try:
        maskedDate = charList.maskDate(dt)
        return maskedDate
    except:
        return ''

def mask_uniqueid(unqid):
    try:
        maskedUnqId = charList.maskSSN(unqid)
        return maskedUnqId
    except:
        return ''

From the above, as you can see that you need pass the right text for right sort of masking. You can definitely use this for most of the known use case, which may support your desired data type.

However, this won't support any future date field in this early version.

Let's see the complete code of this config file ->

playPII.py

#####################################################
#### Written By: SATYAKI DE                      ####
#### Written On: 12-Feb-2023                     ####
#### Modified On 16-Feb-2023                     ####
####                                             ####
#### Objective: This is the main calling         ####
#### python script that will invoke the          ####
#### newly created light data masking class.     ####
####                                             ####
#####################################################

import pandas as p
import clsL as cl
from clsConfigClient import clsConfigClient as cf
import datetime
from FastDataMask import clsCircularList as ccl

# Disbling Warning
def warn(*args, **kwargs):
    pass

import warnings
warnings.warn = warn

######################################
### Get your global values        ####
######################################
debug_ind = 'Y'
charList = ccl.clsCircularList()

CurrPath = cf.conf['SRC_PATH']
FileName = cf.conf['FILE_NAME']
######################################
####         Global Flag      ########
######################################

######################################
### Wrapper functions to invoke    ###
### the desired class from newly   ###
### built class.                   ###
######################################
def mask_email(email):
    try:
        maskedEmail = charList.maskEmail(email)
        return maskedEmail
    except:
        return ''

def mask_phone(phone):
    try:
        maskedPhone = charList.maskPhone(phone)
        return maskedPhone
    except:
        return ''

def mask_name(flname):
    try:
        maskedFLName = charList.maskFLName(flname)
        return maskedFLName
    except:
        return ''

def mask_date(dt):
    try:
        maskedDate = charList.maskDate(dt)
        return maskedDate
    except:
        return ''

def mask_uniqueid(unqid):
    try:
        maskedUnqId = charList.maskSSN(unqid)
        return maskedUnqId
    except:
        return ''

def mask_sal(sal):
    try:
        maskedSal = charList.maskSal(sal)
        return maskedSal
    except:
        return ''
######################################
### End of wrapper functions.      ###
######################################

def main():
    try:
        var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        print('*'*120)
        print('Start Time: ' + str(var))
        print('*'*120)

        inputFile = CurrPath + FileName

        print('Input File: ', inputFile)

        df = p.read_csv(inputFile)

        print('*'*120)
        print('Source Data: ')
        print(df)
        print('*'*120)

        hdr = list(df.columns.values)
        print('Headers:', hdr)

        df["MaskedFirstName"] = df["FirstName"].apply(mask_name)
        df["MaskedEmail"] = df["Email"].apply(mask_email)
        df["MaskedPhone"] = df["Phone"].apply(mask_phone)
        df["MaskedDOB"] = df["DOB"].apply(mask_date)
        df["MaskedSSN"] = df["SSN"].apply(mask_uniqueid)
        df["MaskedSal"] = df["Sal"].apply(mask_sal)

        # Dropping old columns
        df.drop(['FirstName','Email','Phone','DOB','SSN', 'Sal'], axis=1, inplace=True)

        # Renaming columns
        df.rename(columns={'MaskedFirstName': 'FirstName'}, inplace=True)
        df.rename(columns={'MaskedEmail': 'Email'}, inplace=True)
        df.rename(columns={'MaskedPhone': 'Phone'}, inplace=True)
        df.rename(columns={'MaskedDOB': 'DOB'}, inplace=True)
        df.rename(columns={'MaskedSSN': 'SSN'}, inplace=True)
        df.rename(columns={'MaskedSal': 'Sal'}, inplace=True)

        # Repositioning columns of dataframe
        df = df[hdr]

        print('Masked DF: ')
        print(df)

        print('*'*120)
        var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        print('End Time: ' + str(var1))

    except Exception as e:
        x = str(e)
        print('Error: ', x)

if __name__ == "__main__":
    main()

Note that the debug indicator is set to "Y". This will generate logs. If you change this to 'N'. No logs will be generated. However, the process will be faster.

You can certainly contact me to add any features. Depending upon my bandwidth, I'll add them. Please share your feedback at my Technical blog site shared below.

Resources

  • To learn more about my website, check out the blog documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastdatamask-0.0.6.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

fastdatamask-0.0.6-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file fastdatamask-0.0.6.tar.gz.

File metadata

  • Download URL: fastdatamask-0.0.6.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for fastdatamask-0.0.6.tar.gz
Algorithm Hash digest
SHA256 94ff4a6240666204e95af1025f89cda1d601ca238287a0cf7c2a8fc1365ac121
MD5 d7859276f8c17f8dbdd8bc50751bc616
BLAKE2b-256 0d190243b08b305f85863ba84a533afbc59f240a6fae525638632b644eb78bbe

See more details on using hashes here.

File details

Details for the file fastdatamask-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for fastdatamask-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1fbeacd7a4fd55ab001713f1be3f920a6c3c20c0c43909855b62ac6121bad131
MD5 6847e679a829910a2a7db164806e7316
BLAKE2b-256 bb847458a8ead25b1d05f19fe53390cb216b2b5ece8211d0a5b21af47c3f3e07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page