Skip to main content

A tool to substitue patterns/names in a file tree

Project description

UUnonymous

This description can be found on GitHub here

UUnonymous facilitates the replacement of keywords or regex-patterns within a file tree or zipped archive. It recursively traverses the tree, opens supported files and substitutes any found pattern or keyword with a replacement. Besides contents, UUnonymous will substitue keywords/patterns in file/folder-paths as well.

The result will be either a copied or replaced version of the original file-tree with all substitutions made.

As of now, UUnonymous supports text-based files, like .txt, .html, .json and .csv. UTF-8 encoding is assumed. Besides text files, UUnonymous is also able to handle (nested) zip archives. These archives will be unpacked in a temp folder, processed and zipped again.

Installation

$ pip install UUnonymous

Usage

Import the Anomymize class in your code and create an anonymization object like this:

from uunonymous import Anonymize

# refer to csv files in which keywords and substitutions are paired
anonymize_csv = Anonymize('/Users/casper/Desktop/keys.csv')

# using a dictionary instead of a csv file:
my_dict = {
    'A1234': 'aaaa',
    'B9876': 'bbbb',
}
anonymize_dict = Anonymize(my_dict)

# specifying a zip-format to zip unpacked archives after processing (.zip is default)
anonymize_zip = Anonymize('/Users/casper/Desktop/keys.csv', zip_format='gztar')

When using a csv-file, UUnonymous will assume your file contains two columns: the left column contains the keywords which need to be replaced, the right column contains their substitutions. Column headers are mandatory, but don't have to follow a specific format.

When using a dictionary only (absence of the pattern argument), the keys will be replaced by their values.

Performance might be enhanced when your keywords can be generalized into regular expressions. Anynomize_UU will search these patterns and replace them instead of matching the entire dictionary/csv-file against file contents or file/folder-paths. Example:

anonymize_regex = Anonymize(my_dict, pattern=r'[A-B]\d{4}')

By default is case sensitive by default. The regular expressions that take care of the replacements can be modified by using the flag parameter. It takes one or more variables which can be found here. Multiple variables are combined by a bitwise OR (the | operator). Example for a case-insensitive substitution:

anonymize_regex = Anonymize(my_dict, flags=re.IGNORECASE)

By using the use_word_boundaries argument (defaults to False), the algorithm ignores substring matches. If 'ted' is a key in your dictionary, without use_word_boundaries the algorithm will replace the 'ted' part in f.i. 'created_at'. You can overcome this problem by setting use_word_boundaries to True. It will put the \b-anchor around your regex pattern or dictionary keys. The beauty of the boundary anchors is that '@' is considered a boundary as well, and thus names in email addresses can be replaced. Example:

anonymize_regex = Anonymize(my_dict, use_word_boundaries=True)

Windows usage

There is an issue with creating zip archives. Make sure you run UUnonymous as administrator.

Inplace replacements vs. replacements in a copy

UUnonymous is able to create a copy of the processed file-tree or replace it. The substitute method takes a mandatory source-path argument (path to a file, folder or zip-archive, either a string or a Path object) and an optional target-path argument (again, a string or Path object). The target needs to refer to a folder. The target-folder will be created if it doesn't exist.

When the target argument is provided, UUnonymous will create a processed copy of the source into the target-folder. If the source is a single file, and the file path does not contain elements that will be replaced, and the target-folder is identical to the source folder, than the processed result will get a 'copy' extension to prevent overwriting.

When the target argument is omitted, the source will be overwritten by a processed version of it:

# process the datadownload.zip file, replace all patterns and write
# a copy to the 'bucket' folder.
anonymize_regex.substitute(
    '/Users/casper/Desktop/datadownload.zip', 
    '/Users/casper/Desktop/bucket'
)

# process the 'download' folder and replace the original by its processed 
# version
anonymize_regex.substitute('/Users/casper/Desktop/download')

# process a single file, and replace it
anonymize_regex.substitute('/Users/casper/Desktop/my_file.json')

Todo

Testing ;)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

UUnonymous-0.0.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distributions

UUnonymous-0.0.1-py3.7.egg (9.7 kB view details)

Uploaded Egg

UUnonymous-0.0.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file UUnonymous-0.0.1.tar.gz.

File metadata

  • Download URL: UUnonymous-0.0.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for UUnonymous-0.0.1.tar.gz
Algorithm Hash digest
SHA256 17a2c247b8cbcd266aa4876416e5733a0c8c1fc5bb681679e4df4ee44d15e2b5
MD5 b605a65ec884b91544d5ca3446331431
BLAKE2b-256 15b0c0d8211602974d1c7b44b86804665ce24978894a8651f297d3d233b53157

See more details on using hashes here.

File details

Details for the file UUnonymous-0.0.1-py3.7.egg.

File metadata

  • Download URL: UUnonymous-0.0.1-py3.7.egg
  • Upload date:
  • Size: 9.7 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for UUnonymous-0.0.1-py3.7.egg
Algorithm Hash digest
SHA256 ebf566cba3cd0d414644661754f6b098cdf0663856bb7267d4a967540b017de7
MD5 70f8a94594369fdfea0c4bcc6fd5a09b
BLAKE2b-256 92bbdb8f9771dd575afd61605689677ef3b0a58d98209333a0af1b0f698317c5

See more details on using hashes here.

File details

Details for the file UUnonymous-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: UUnonymous-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for UUnonymous-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d17923eaca9d623ab1ed96672721e4c6d44845c66e4327e4fa59240e74d6014
MD5 66e28f2b0b47ce17ec172a5059b49882
BLAKE2b-256 639526b4918ab68fc6f88ca0eb51bf7e8df541e6047f2b76f49f77f88e8f9046

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page