Skip to main content

Modify large text file line by line and encrypt with GnuPG.

Project description

gpgodzilla - Large File Encryption with Line Modification

gpgodzilla enables developers and data scientists to encrypt and decrypt large and structured files while modifying them using whatever custom functions/methods they want, in memory. It is specifically designed for de-tokenization + encryption of sensitive data such as account numbers and social insurance numbers, where raw data is not allowed to live on the local storage of the system.

Use Case

The files that can be worked with should be structured on a line-by-line basis. For example, each line has some portions that need to be modified in the same way before encryption or after decryption.

One primary example is for transferring and processing PANs. Provided is a file of customers' PANs that are tokenized. The requirement is to send the encrypted and de-tokenized PANs to the receiver.

However, because PANs are highly sensitive, the de-tokenized/raw PANs cannot touch the local storage of the system. Hence, de-tokenization and encryption need to happen in system memory.


It is essential to have GnuPG 2 installed on the system.

Quick Start

Install via pip:

pip install gpgodzilla

Basic Example

With the file to manipulate on the local storage, define the path to the file and the path to the processed/manipulated file. The file to process must exist.

Define the recipient of the encrypted file and the manipulation method for each line (ex. de-tokenization method that returns the de-tokenized line).

from gpgodzilla import encrypt_large_file, decrypt_large_file

def tokenize_foo(line):
    # The example tokenization
    # replacing each "foo" with "bar" before encryption
    line = line.replace('foo', 'bar')
    return line

def detokenize_bar(line):
    # The example detokenization method
    # replacing each "some_token" with "cipher" before encryption
    line = line.replace('bar', 'foo')
    return line

# The following code demonstrates a simple use case

file_to_encrypt = 'test.txt'  # File to manipulate and encrypt
recipient = ''  # Email of the recipient (GnuPG), which must exist on the system on which the code is running
output_file_encrypt = 'test.pgp'   # path of the encrypted & manipulated file
output_file_decrypt = 'original_test.txt' # path of the decrypted file
encrypt_large_file(recipient, file_to_encrypt, output_file_encrypt, tokenize_foo)

# then, to decrypt the file and detokenize each 'bar' back to 'foo':
decrypt_large_file(output_file_encrypt, output_file_decrypt, detokenize_bar, PASSPHRASE)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpgodzilla-0.0.1.tar.gz (4.2 kB view hashes)

Uploaded source

Built Distribution

gpgodzilla-0.0.1-py3-none-any.whl (4.3 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page