A tool to substitue patterns/names in a file tree
Project description
Anonymize UU
This description can be found on GitHub here
Anonymize_UU facilitates the replacement of keywords or regex-patterns within a file tree or zipped archive. It recursively traverses the tree, opens supported files and substitutes any found pattern or keyword with a replacement. Besides contents, anomize_UU will substitue keywords/patterns in file/folder-paths as well.
The result will be either a copied or replaced version of the original file-tree with all substitutions made.
As of now, Anonymize_UU supports text-based files, like .txt, .html, .json and .csv. UTF-8 encoding is assumed. Besides text files, Anonymize_UU is also able to handle (nested) zip archives. These archives will be unpacked in a temp folder, processed and zipped again.
Installation
$ pip install anonymize_UU
Usage
Import the Anomymize class in your code and create an anonymization object like this:
from anonymize import Anonymize
# refer to csv files in which keywords and substitutions are paired
anonymize_csv = Anonymize('/Users/casper/Desktop/keys.csv')
# using a dictionary instead of a csv file:
my_dict = {
'A1234': 'aaaa',
'B9876': 'bbbb',
}
anonymize_dict = Anonymize(my_dict)
# specifying a zip-format to zip unpacked archives after processing (.zip is default)
anonymize_zip = Anonymize('/Users/casper/Desktop/keys.csv', zip_format='gztar')
When using a csv-file, anonymize_UU will assume your file contains two columns: the left column contains the keywords which need to be replaced, the right column contains their substitutions. Column headers are mandatory, but don't have to follow a specific format.
When using a dictionary only (absence of the pattern
argument), the keys will be replaced by their values.
Performance might be enhanced when your keywords can be generalized into regular expressions. Anynomize_UU will search these patterns and replace them instead of matching the entire dictionary/csv-file against file contents or file/folder-paths. Example:
anonymize_regex = Anonymize(my_dict, pattern=r'[A-B]\d{4}')
By using the use_word_boundaries
argument (defaults to False), the algorithm ignores substring matches. If 'ted' is a key in your dictionary, without use_word_boundaries
the algorithm will replace the 'ted' part in f.i. 'created_at'. You can overcome this problem by setting use_word_boundaries
to True. It will put the \b
-anchor around your regex pattern or dictionary keys. The beauty of the boundary anchors is that '@' is considered a boundary as well, and thus names in email addresses can be replaced. Example:
anonymize_regex = Anonymize(my_dict, use_word_boundaries=True)
Windows usage
There is an issue with creating zip archives. Make sure you run anonymize_UU as administrator.
Copy vs. replacing
Anonymize_UU is able to create a copy of the processed file-tree or replace it. The substitute
method takes a mandatory source-path argument (path to a file, folder or zip-archive, either a string or a Path object) and an optional target-path argument (again, a string or Path object). The target needs to refer to a folder. The target-folder will be created if it doesn't exist.
When the target argument is provided, anonymize_UU will create a processed copy of the source into the target-folder. When the target argument is omitted, the source will be overwritten by a processed version of it:
# process the datadownload.zip file, replace all patterns and write
# a copy to the 'bucket' folder.
anonymize_regex.substitute(
'/Users/casper/Desktop/datadownload.zip',
'/Users/casper/Desktop/bucket'
)
# process the 'download' folder and replace the original by its processed
# version
anonymize_regex.substitute('/Users/casper/Desktop/download')
# process a single file, and replace it
anonymize_regex.substitute('/Users/casper/Desktop/my_file.json')
Todo
Testing ;)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for anonymize_UU-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db5a133615fad53521c599966859cd65493656391498ce186e66c377b8d6bf9f |
|
MD5 | 338461732e41f7dd8feba200067c68ce |
|
BLAKE2b-256 | b2f9f7572e1791c9ca507aba240987c58f54a736841c83724784adaf219f66ae |