Skip to main content

A utility to obfuscate and mask elements in XML and JSON files

Project description

XML AND JSON MASKING

For detailed information please visit = https://sonra.io/2019/04/01/paranoid-masking-anonymizing-and-obfuscating-pii-in-xml-and-json-data/

About

Paranoid is data masking and obfuscation command line tool for XML and JSON file formats. Paranoid is best used in combination with Flexter. Flexter is Sonra's XML converter for complex XML and JSON based on industry data standards such as ACORD, HL7, FHIR, NDC, XBRL, FpML etc. It converts XML to any relational database, Hadoop formats (ORC, Parquet, Avro, Hive, Impala), or text (TSV, CSV).

Features

  • Works with one or more XML/JSON document(s). If input path points to a directory - processes its content recursively. Auto detects the format of each file.
  • Masks all the elements/attributes in the XML/JSON document(s) by default while preserving the exact structure of the file(s).
  • Can also mask only specific elements (by provided path/XPath) in XML/JSON document(s).
  • Universal: runs on both: Python 2.6+ or 3.6+
  • Offline tool - runs locally on your system. No data gets transferred anywhere.
  • Open source - anyone can examine what it does to make sure the data can't be successfully de-encoded back after leaving The Sausage Machine. Any contributions are welcome!
  • Easy installation - can download script itself or use pip

Advanced Features

  • Custom built Parser - simple parser that does only what needs to be done. Removes the overhead of using external libs. It's fast. It doesn't validate documents so can work with some rough edged ones …to some extent.
  • Smart Buffering - easy on memory (redefinable buffer to use, 512MB default) but at the same time works with huge files (eg. 10GB). Works with them even if all the content is lumped into a single line 💪‼
  • Masking Statistics - Provides stats for number of xml tags and number of tags masked in during the operation which one can store in a log file too.

Architecture

Architecture

Installation

pip install PARANOID

Instructions

usage: paranoid [-h] -i INPUT [-b BYTESIZE] -o OUTPUTDIR [-l MASK] [-L LOG] [-v]

optional arguments:

  • -h, --help show this help message and exit
  • -i INPUT Input Directory Name / File Name
  • -b BYTESIZE Provide byte size to buffer
  • -o OUTPUTDIR Output Directory Name
  • -l MASK Input xpath or xpaths separated by ,
  • -L LOG Output in Log File
  • -v, --version show program's version number and exit
paranoid -h

Usage

Usage Examples

Mask Single File

paranoid -i <input filename> -o <output directory name>

SingleFile

Mask all XML and JSON files in a Directory

paranoid -i <directory name> -o <output directory name>

MultipleFile

Change Buffer size

paranoid -i <File or directory name> -o <output directory name> -b buffersize

BufferLimit

That's the way to ingest big fat one liners as it analyses your file by streamig it byte by byte, buffer by buffer.

Mask Certain Tags

paranoid -i <input filename> -o <output directory name> -l xpath separated by ,

SpecificTags

Mask Certain Attributes

paranoid -i <input filename> -o <output directory name> -l xpath separated by ,

Example request:

python paranoid.py -i ~/tests/in/case.xml -o ~/tests/in/anonymized -b 2000 -l /Case/Id/@HubNo,/Case/ProductSet/Product/@ProductionUnit

This argument also accepts relative xpaths to mask nodes located at any depth of the xml tree:

python paranoid.py -i ~/tests/in/case.xml -o ~/tests/in/anonymized -b 2000 -l /Case/Id/@HubNo,//Product/@ProductionUnit

Generate Log File

paranoid -i <input filename> -o <output directory name> -L Log File Location

Generate Log File

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PARANOID-1.2.9.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

PARANOID-1.2.9-py2.py3-none-any.whl (16.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file PARANOID-1.2.9.tar.gz.

File metadata

  • Download URL: PARANOID-1.2.9.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for PARANOID-1.2.9.tar.gz
Algorithm Hash digest
SHA256 e890fbb1bc91c28781d3e690cf4f105748fe3645c75da70d503e8f8f919721d9
MD5 440588e855d2181d2e7da0ab9574a402
BLAKE2b-256 df97168c78858c20fc2d09f92712b6f662b860dfdf54661d3650640cbde33c04

See more details on using hashes here.

File details

Details for the file PARANOID-1.2.9-py2.py3-none-any.whl.

File metadata

  • Download URL: PARANOID-1.2.9-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for PARANOID-1.2.9-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b97e39182a5f16bea71a9835e44861749b9c7fe938a82e5b2be1ca2753ca4c7f
MD5 9eb1661429e108fb4306362a55c68220
BLAKE2b-256 bbe71e79a6a1f0b0b7a27f344acb49afe11e5de04ca7388e90a06e7ce2a1bd75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page