A tool to substitue patterns/names in a file tree

Project description

anonymoUUs

This description can be found on GitHub here

anonymoUUs facilitates the replacement of keywords or regex-patterns within a file tree or zipped archive. It recursively traverses the tree, opens supported files and substitutes any found pattern or keyword with a replacement. Besides contents, anonymoUUs will substitue keywords/patterns in file/folder-paths as well.

The result will be either a copied or replaced version of the original file-tree with all substitutions made.

As of now, anonymoUUs supports text-based files, like .txt, .html, .json and .csv. UTF-8 encoding is assumed. Besides text files, anonymoUUs is also able to handle (nested) zip archives. These archives will be unpacked in a temp folder, processed and zipped again.

Installation

$ pip install anonymoUUs

Usage

In order to replace words or patterns you need a replacement-mapping in the form of:

a dictionary - the keys will be replaced by the values
the path to a csv file - a csv file will be converted in a dictionary, the first column provides keys, the second value provides values. Path can be a String, Path or PosixPath!
a function - a replacement function can be passed if a pattern is used. The function takes a found match and should return its replacement. The function must have at least one input argument.

Example of replacement with a dictionary

Import the Anomymize class in your code and create an anonymization object like this:

from anonymoUUs import Anonymize

# refer to csv files in which keywords and substitutions are paired
anonymize_csv = Anonymize('/Users/casper/Desktop/keys.csv')

# using a dictionary instead of a csv file:
my_dict = {
    'A1234': 'aaaa',
    'B9876': 'bbbb',
}
anonymize_dict = Anonymize(my_dict)

Putting regular expression in dictionaries is also possible.When using a dictionary only (absence of the pattern argument), the keys-pattern will be replaced by its value:

anon = Anonymize(
    {
        'regular-key': 'replacement-1',
        re.compile('ca.*?er'): 'replacement-2'
    }
)

Example of replacement with a CSV file

# specifying a zip-format to zip unpacked archives after processing (.zip is default)
anonymize_zip = Anonymize('/Users/casper/Desktop/keys.csv')

When using a csv-file, anonymoUUs will assume your file contains two columns: the left column contains the keywords which need to be replaced, the right column contains their substitutions. Column headers are mandatory, but don't have to follow a specific format.

It is possible to add a regular expression as keyword in the csv-file. Make sure they start with the prefix 'r#'. Example:

r#ca.*?er, replacement_string

The key will be compiles as a regex and replace every match with 'replacement_string'.

Example of replacement by regex pattern and function

If you are replacing with a pattern you can also use a function to 'calculate' the replacement string:

def replace(match, **kwargs):
    result = 'default-replacement'
    match = int(match)
    threshold = kwargs.get("threshold", 4000)
    if match < threshold:
        result = 'special-replacement'
    return result

anon = Anonymize(replace, pattern=r'\d{4}', threshold=1000)
anon.substitute(
    '/Users/casperkaandorp/Desktop/test.json', 
    '/Users/casperkaandorp/Desktop/result-folder'
)

Note the possibility to provide additional arguments when you initialize an Anonymize object that will be passed to the replcement function (in the previous example, the threshold is passed to the replace function).

Other arguments

Performance is probably best when your keywords can be generalized into a single regular expressions. anonymoUUs will search these patterns and replace them instead of matching the entire dictionary/csv-file against file contents or file/folder-paths. Example:

anonymize_regex = Anonymize(my_dict, pattern=r'[A-B]\d{4}')

By default is case sensitive by default. The regular expressions that take care of the replacements can be modified by using the flag parameter. It takes one or more variables which can be found here. Multiple variables are combined by a bitwise OR (the | operator). Example for a case-insensitive substitution:

anonymize_regex = Anonymize(my_dict, flags=re.IGNORECASE)

By using the use_word_boundaries argument (defaults to False), the algorithm ignores substring matches. If 'ted' is a key in your dictionary, without use_word_boundaries the algorithm will replace the 'ted' part in f.i. 'created_at'. You can overcome this problem by setting use_word_boundaries to True. It will put the \b-anchor around your regex pattern or dictionary keys. The beauty of the boundary anchors is that '@' is considered a boundary as well, and thus names in email addresses can be replaced. Example:

anonymize_regex = Anonymize(my_dict, use_word_boundaries=True)

It is also to specify how to re-zip unzipped folders:

# specifying a zip-format to zip unpacked archives after processing (.zip is default)
anonymize_zip = Anonymize('/Users/casper/Desktop/keys.csv', zip_format='gztar')

Windows usage

There is an issue with creating zip archives. Make sure you run anonymoUUs as administrator.

Inplace replacements vs. replacements in a copy

anonymoUUs is able to create a copy of the processed file-tree or replace it. The substitute method takes a mandatory source-path argument (path to a file, folder or zip-archive, either a string or a Path object) and an optional target-path argument (again, a string or Path object). The target needs to refer to a folder, which can't be a sub-folder of the source-folder. The target-folder will be created if it doesn't exist.

When the target argument is provided, anonymoUUs will create a processed copy of the source into the target-folder. If the source is a single file, and the file path does not contain elements that will be replaced, and the target-folder is identical to the source folder, than the processed result will get a 'copy' extension to prevent overwriting.

When the target argument is omitted, the source will be overwritten by a processed version of it:

# process the datadownload.zip file, replace all patterns and write
# a copy to the 'bucket' folder.
anonymize_regex.substitute(
    '/Users/casper/Desktop/datadownload.zip', 
    '/Users/casper/Desktop/bucket'
)

# process the 'download' folder and replace the original by its processed 
# version
anonymize_regex.substitute('/Users/casper/Desktop/download')

# process a single file, and replace it
anonymize_regex.substitute('/Users/casper/Desktop/my_file.json')

Reading contents of a file

Files will be opened depending on their extension. Non refognized extensions will be skipped. The standard version of this package assumes 'UTF-8' encoding. Errors are going to be ignored. Since reading file-contents is done with a single function, it will be easy to adjust (different encodings,etc) by overloading it in an extension:

# standard reading function
def _read_file(self, source: Path):
    f = open(source, 'r', encoding='utf-8', errors='ignore')
    contents = list(f)
    f.close()
    return contents

Todo

Cleaning up this document

Testing! Sweet momma, it needs testing.

Project details

Release history Release notifications | RSS feed

This version

0.0.8

Mar 12, 2021

0.0.7

Dec 1, 2020

0.0.6

Dec 1, 2020

0.0.5

Nov 17, 2020

0.0.4

Nov 15, 2020

0.0.3

Nov 6, 2020

0.0.2

Nov 6, 2020

0.0.1

Oct 18, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonymoUUs-0.0.8.tar.gz (8.1 kB view details)

Uploaded Mar 12, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anonymoUUs-0.0.8-py3-none-any.whl (8.3 kB view details)

Uploaded Mar 12, 2021 Python 3

File details

Details for the file anonymoUUs-0.0.8.tar.gz.

File metadata

Download URL: anonymoUUs-0.0.8.tar.gz
Upload date: Mar 12, 2021
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for anonymoUUs-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`a9084a9b76d90f59ee1ba38c67a8baeb3dfb03cd1bcd211512851462d7d7b2ce`
MD5	`6c449a5b03b9b438fe3e12af111fd8ff`
BLAKE2b-256	`5f029107d8855d4b6be51f7335ca2044ccde85b18d390a083f25837a45189525`

See more details on using hashes here.

File details

Details for the file anonymoUUs-0.0.8-py3-none-any.whl.

File metadata

Download URL: anonymoUUs-0.0.8-py3-none-any.whl
Upload date: Mar 12, 2021
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for anonymoUUs-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c63001eaa406a806c69e5a62ca2ac439fc8732999100e1a267969637984bb6f`
MD5	`86b8429136fd73b556a9150ba45d1083`
BLAKE2b-256	`a287ee3d9a6c30d9ca2b347d40640dd8ad99f10aa4fe240345696b4d5ebde5a1`

See more details on using hashes here.

anonymoUUs 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

anonymoUUs

Installation

Usage

Example of replacement with a dictionary

Example of replacement with a CSV file

Example of replacement by regex pattern and function

Other arguments

Windows usage

Inplace replacements vs. replacements in a copy

Reading contents of a file

Todo

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes