Skip to main content

PDF anonymizer/synthesizer for Cradl

Project description

PDF anonymizer/synthesizer for Cradl

Disclaimer

This code does not guarantee that PDFs will be successfully anonymized/synthesized. Use at your own risk.

Installation

$ pip install lucidtech-synthetic

Basic Usage

Docker

We recommend disabling networking and setting /path/to/src_dir to read-only as shown below:

docker run --network none -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir

CLI

synthetic pdf /path/to/src_dir /path/to/dst_dir

/path/to/src_dir is the input directory and should contain your PDFs and JSON ground truths /path/to/dst_dir is the output directory where synthesized PDFs and JSON ground truths will be written to

Here is an example of the directory layout for /path/to/src_dir:

/path/to/src_dir
├── a.pdf
├── a.json
├── b.pdf
├── b.json
├── c.pdf
└── c.json

The output directory will follow the same layout but with modified PDFs and JSON ground truths:

/path/to/dst_dir
├── a.pdf
├── a.json
├── b.pdf
├── b.json
├── c.pdf
└── c.json

Using a custom Synthesizer

CLI

synthetic pdf /path/to/src_dir /path/to/dst_dir --synthesizer-class path.to.python.Class

Make sure that parent directory of path.to.python.Class is in your PYTHONPATH

Example using one of the example Synthesizers in examples directory

synthetic pdf /path/to/src_dir /path/to/dst_dir --synthesizer-class examples.exclude-words.synthesizer.ExcludeWordsSynthesizer

Docker

docker run --network none -v /path/to/synthesizer:/root/synthesizer -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir --synthesizer-class mypythonfile.ExcludeWordsSynthesizer

Note that the python module must be mounted into the docker container to /root/synthesizer for it to work. In the above example we assume a directory structure of your custom synthesizer to be like below.

/path/to/synthesizer
└── mypythonfile.py

Example using one of the example Synthesizers in examples directory. The examples directory should already exist in the image so that we don't need to mount anything additional.

docker run --network none -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir --synthesizer-class examples.exclude-words.synthesizer.ExcludeWordsSynthesizer

Help

All methods support the --help flag which will provide information on the purpose of the method, and what arguments could be added.

$ synthetic --help

Known Issues

PDF Synthesizer

  • Does not synthesize images
  • Replaced strings are never hexadecimal encoded

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lucidtech-synthetic-0.2.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

lucidtech_synthetic-0.2.0-py2.py3-none-any.whl (16.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file lucidtech-synthetic-0.2.0.tar.gz.

File metadata

  • Download URL: lucidtech-synthetic-0.2.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for lucidtech-synthetic-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1a0fe09ba2462a8d782b4577f0f9bdb5078dba90852b681e2aadd58f4ee6f516
MD5 5aee2dd439965b11fa60a8478ccacf1c
BLAKE2b-256 c45010534b31295ff5a5928fd1480dd974d1d44a0ff791f0f4c1259d5f738c82

See more details on using hashes here.

File details

Details for the file lucidtech_synthetic-0.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for lucidtech_synthetic-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6c7c11db861926a3b8097eb8324ff5a6fa0d1a2305196b945befb11c77bb6237
MD5 bbe33f13299ffd2591ae2141e1fc0739
BLAKE2b-256 649bba4d58ce54079a140257756b19134bc17b96ed7dc3a6e74f66224ffdb0bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page