Skip to main content

A tool to substitue ids in a file tree

Project description

Anonymize UU

This description can be found on GitHub here

Anonymize_UU facilitates the replacement of keywords or regex-patterns within a file tree or zipped archive. It recursively traverses the tree, opens supported files and substitutes any found pattern or keyword with a replacement. Besides contents, anomize_UU will substitue keywords/patterns in file/folder-paths as well.

The result will be either a copied or replaced version of the original file-tree with all substitutions made.

As of now, Anonymize_UU supports text-based files, like .txt, .html, .json and .csv. UTF-8 encoding is assumed. Besides text files, Anonymize_UU is also able to handle (nested) zip archives. These archives will be unpacked in a temp folder, processed and zipped again.

Installation

$ pip install anonymize_UU

Usage

Import the Anomymize class in your code and create an anonymization object like this:

from anonymize import Anonymize

# refer to csv files in which keywords and substitutions are paired
anonymize_csv = Anonymize('/Users/casper/Desktop/keys.csv')

# using a dictionary instead of a csv file:
my_dict = {
    'A1234': 'aaaa',
    'B9876': 'bbbb',
}
anonymize_dict = Anonymize(my_dict)

# specifying a zip-format to zip unpacked archives after processing (.zip is default)
anonymize_zip = Anonymize('/Users/casper/Desktop/keys.csv', zip_format='gztar')

When using a csv-file, anonymize_UU will assume your file contains two columns: the left column contains the keywords which need to be replaced, the right column contains their substitutions. Column headers are mandatory, but don't have to follow a specific format.

When using a dictionary, the keys will be replaced by their values.

Performance might be enhanced when your keywords can be generalized into regular expressions. Anynomize_UU will search these patterns and replace them instead of matching the entire dictionary/csv-file against file contents or file/folder-paths. Example:

anonymize_regex = Anonymize(my_dict, pattern=r'[A-B]\d{4}')

Windows usage

There is an issue with creating zip archives. Make sure you run anonymize_UU as administrator.

Copy vs. replacing

Anonymize_UU is able to create a copy of the processed file-tree or replace it. The substitute method takes a mandatory source-path argument (path to a file, folder or zip-archive, either a string or a Path object) and an optional target-path argument (again, a string or Path object). The target needs to refer to a folder. The target-folder will be created if it doesn't exist.

When the target argument is provided, anonymize_UU will create a processed copy of the source into the target-folder. When the target argument is omitted, the source will be overwritten by a processed version of it:

# process the datadownload.zip file, replace all patterns and write
# a copy to the 'bucket' folder.
anonymize_regex.substitute(
    '/Users/casper/Desktop/datadownload.zip', 
    '/Users/casper/Desktop/bucket'
)

# process the 'download' folder and replace the original by its processed 
# version
anonymize_regex.substitute('/Users/casper/Desktop/download')

# process a single file, and replace it
anonymize_regex.substitute('/Users/casper/Desktop/my_file.json')

Todo

Testing ;)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonymize_UU-0.1.3.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

anonymize_UU-0.1.3-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file anonymize_UU-0.1.3.tar.gz.

File metadata

  • Download URL: anonymize_UU-0.1.3.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3

File hashes

Hashes for anonymize_UU-0.1.3.tar.gz
Algorithm Hash digest
SHA256 49ce89afce05bec4fb627e3b1d6bc093ad65d8c7f22577c898c68e11896cb288
MD5 8b3bd3fcb6b3e98d5c6130a6268881f3
BLAKE2b-256 837a125547a45e5ed0c059392da8091bdbe5d0dc48a8de0e28bac66caa995437

See more details on using hashes here.

File details

Details for the file anonymize_UU-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: anonymize_UU-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3

File hashes

Hashes for anonymize_UU-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1ababd8d9e19033b2fd4ee4285ae7c2cf88741f069b9985829e29c68d1f54b8e
MD5 df4eb1c974e99f797977f952c0fb818f
BLAKE2b-256 de74a7e7aee6b62e1a96b02ad4824a6d4de1ef0769b504a7e3c9cde3c97e57e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page