PDF anonymizer/synthesizer for Cradl
Project description
PDF anonymizer/synthesizer for Cradl
Disclaimer
This code does not guarantee that PDFs will be successfully anonymized/synthesized. Use at your own risk.
Installation
$ pip install lucidtech-synthetic
Basic Usage
Docker
We recommend disabling networking and setting /path/to/src_dir
to read-only as shown below:
docker run --network none -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir
CLI
synthetic pdf /path/to/src_dir /path/to/dst_dir
/path/to/src_dir
is the input directory and should contain your PDFs and JSON ground truths
/path/to/dst_dir
is the output directory where synthesized PDFs and JSON ground truths will be written to
Here is an example of the directory layout for /path/to/src_dir
:
/path/to/src_dir
├── a.pdf
├── a.json
├── b.pdf
├── b.json
├── c.pdf
└── c.json
The output directory will follow the same layout but with modified PDFs and JSON ground truths:
/path/to/dst_dir
├── a.pdf
├── a.json
├── b.pdf
├── b.json
├── c.pdf
└── c.json
Using a custom Synthesizer
CLI
synthetic pdf /path/to/src_dir /path/to/dst_dir --synthesizer-class path.to.python.Class
Make sure that parent directory of path.to.python.Class
is in your PYTHONPATH
Example using one of the example Synthesizers in examples
directory
synthetic pdf /path/to/src_dir /path/to/dst_dir --synthesizer-class examples.exclude-words.synthesizer.ExcludeWordsSynthesizer
Docker
docker run --network none -v /path/to/synthesizer:/root/synthesizer -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir --synthesizer-class mypythonfile.ExcludeWordsSynthesizer
Note that the python module must be mounted into the docker container to /root/synthesizer
for it to work. In the above example we assume a directory structure of your custom synthesizer to be like below.
/path/to/synthesizer
└── mypythonfile.py
Example using one of the example Synthesizers in examples
directory. The examples
directory should already exist in the image so that we don't need to mount anything additional.
docker run --network none -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir --synthesizer-class examples.exclude-words.synthesizer.ExcludeWordsSynthesizer
Help
All methods support the --help
flag which will provide information on the purpose of the method,
and what arguments could be added.
$ synthetic --help
Known Issues
PDF Synthesizer
- Does not synthesize images
- Replaced strings are never hexadecimal encoded
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lucidtech-synthetic-0.2.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | db122cdbe61f0fb8f200fd29a061f5ba4467378904c018996bd3387d162f85bd |
|
MD5 | b0bc4ad204af53d2104fa257a52adc1a |
|
BLAKE2b-256 | b6308531bcc0952a7cf992c3f924eec4f2bb6544b0fb2a3115726d282a0a6f85 |
Hashes for lucidtech_synthetic-0.2.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa9f2a6622207bc917f27f36b3e8e075e3806ed0d086843786b2f65f1ac73303 |
|
MD5 | 9ad822cbbe93c14cd239b84732eeba83 |
|
BLAKE2b-256 | c44a845101480a701abf48ea3f4186c93e99b2d44795dbfa13504080418a07b8 |