Skip to main content

Redacting classified documents

Project description

redacted-py

Redacting classified documents

GitHub tag (latest by date) GitHub last commit GitHub issues GitHub license PyPI - Version

This repository holds the code base for my redacted-py library in Python.
It is mainly based off my Feistel cipher for Format-Preserving Encryption to which I added a few tools to handle document, database and file manipulation to ease out the operation.

Motivation

In some fields (like healthcare for instance), protecting the privacy of data whilst being able to conduct in-depth studies is both vital and mandatory. Redacting documents and databases is therefore the obligatory passage. With redacted-py, I provide a simple yet secure tool to help redacting documents based on either a dictionary, a record layout or a tag to decide which parts should actually be redacted.

Usage

You can use either a dictionary or a tag (or both) to identify the words you want to redact in a document. The tag should be placed before any word that should be redacted. The default tag is the tilde character (~).

For example, the following sentence will only see the word tagged redacted: "This is a ~tagged sentence".

$ pip install redacted-py
from redacted import DefaultRedactor, Dictionary
from feistel import FPECipher, SHA_256

source = "Some text ~tagged or using words in a dictionary"

cipher = FPECipher(SHA_256, key, 10)
redactor = DefaultRedactor(cipher)
redacted = redactor.redact(source)

expanded = redactor.expand(redacted)
assert expanded == source, "Original data should equal ciphered then deciphered data"

cleansed = redactor.clean(expanded)
assert cleansed == "Some text tagged or using words in a dictionary", "Cleaning should remove any tag mark"

You may also use it in the console with the following command line instructions:

usage: python3 -m redacted [-h] [-b | --both | --no-both] [-d DICTIONARY] [-H HASH] [-i INPUT] [-k KEY] [-o OUTPUT] [-r ROUNDS] [-t TAG] [-x | --expand | --no-expand]

options:
  -h, --help            show this help message and exit
  -b, --both, --no-both
                        Add to use both dictionary and tag
  -d DICTIONARY, --dictionary DICTIONARY
                        The optional path to the dictionary of words to redact
  -H HASH, --hash HASH  The hash engine for the round function [default sha-256]
  -i INPUT, --input INPUT
                        The path to the document to be redacted
  -k KEY, --key KEY     The optional key for the FPE scheme (leave it empty to use default)
  -o OUTPUT, --output OUTPUT
                        The name of the output file
  -r ROUNDS, --rounds ROUNDS
                        The number of rounds for the Feistel cipher [default 10]
  -t TAG, --tag TAG     The optional tag that prefixes words to redact [default ~]
  -x, --expand, --no-expand
                        Add to expand a redacted document

Tests

$ git clone https://github.com/cyrildever/redacted.git
$ cd redacted/py/
$ pip install -e .
$ python3 -m unittest discover

License

The use of the redacted libraries and executables are subject to fees for commercial purpose and to the respect of the BSD-2-Clause-Patent license.
Please contact me to get further information.

NB: It is still under development so use in production at your own risk for now.


© 2024 Cyril Dever. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redacted_py-1.0.5.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

redacted_py-1.0.5-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file redacted_py-1.0.5.tar.gz.

File metadata

  • Download URL: redacted_py-1.0.5.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.31.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for redacted_py-1.0.5.tar.gz
Algorithm Hash digest
SHA256 5334cb6291df230dfb0153de05314e9bc078e0497948a40042b2cd5e1f28c82f
MD5 25a6761de0cf0d7e668d76ad42c5b26b
BLAKE2b-256 5b86fa08a91ce4555d2dda77e03a872c1d8fdab34ce5f94bb3a96146d123931b

See more details on using hashes here.

File details

Details for the file redacted_py-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: redacted_py-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.31.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for redacted_py-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 130404621517779d42c48b72c81d1607c99b91816d8a152016c5863dea03615a
MD5 b8ce937e86ba9ce15d7f690d3065aaf0
BLAKE2b-256 890ea983e3bea4c2ade0ddaee6f015364382b86ad05fcff1e58ecf573ae77b8c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page