Provides an overview of the inner file structure of a PDF and extracts /URI and /JS data.
Project description
Description
Provides an overview of the inner file structure of a PDF and extracts /URI and /JS data.
Configuration
options.json contains the rules for searching the PDF document.
If you want to have additional information, just add a new object to this file.
The following provides an example of an object:
[
{
"name": "obj", // The name of the entity which shall
// be found. Just acts as a display name
"type": "tag", // There are 5 different types:
// - metadata
// - action
// - tag
// - code
// - embedded
// These types take care of the order within
// the output (see example.txt)
"action": "count", // There are 2 different actions:
// - count (Counts all regex matches)
// - value (Provides the value of a
// regex)
"regexes": ["(?<!end)obj"] // The regex to find what you need. Make
// sure, it matches the selected action
}
]
Usage
From command line:
python -m pdf_investigator [-h] --path PATH
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
| --path | -p | String | - | Path to the PDF directory |
Example
python -m pdf_investigator -p "path/to/pdf" > result.txt
You can find the following result here:
################################################################################
PDF Examiner by 5f0
Provides an overview of the inner file structure of a PDF
Current working directory: /path/to/pdfs
Investigated PDFs in: ./sample-files
Total numbers of examined PDFs: 1
Datetime: 01/01/1980 11:12:13
################################################################################
Examined file: ./sample-file/sample.pdf
MD5 Hash: 851acee02bd8d002e3b9af184d0c8959
SHA256 Hash: f723638db6e763cf4ccadad38a3d38a02d9ecab95dab1f0aebf00e801991b5f92
--> Version: %PDF-1.5
--> Metadata:
/Author : ['5f0']
/Creator : ['LaTeX with hyperref package']
/Producer : ['pdfTeX-1.40.16']
/CreationDate : ["D:19700101193102+02\\'00\\'"]
/ModDate : ["D:19700101193102+02\\'00\\'"]
--> Tags:
obj : 6
endobj : 6
stream : 5
endstream : 5
xref : 0
startxref : 1
trailer : 0
--> Actions:
/Action : 1
/URI : 2
/URI values : ['http://example.com/']
/OpenAction : 1
/Named : 0
/Launch : 0
/AcroForm : 0
---> Code:
/JavaScript : 2
/JS : 1
/JS values : ['var v = app.viewerVersion;']
---> Embedded:
/RichMedia : 0
/EmbeddedFile : 0
/Encrypt : 2
################################################################################
Execution Time: 0.048780 sec
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_examiner-1.0.0.tar.gz.
File metadata
- Download URL: pdf_examiner-1.0.0.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
228d2a456f06aae7fc0af6811e25a3e6e3ae020c02efeedafd989cb0452fda96
|
|
| MD5 |
9d574999e697ea4a700d1f8aa36f2f6b
|
|
| BLAKE2b-256 |
d8c640beebc57f4e5e2312bad094ba079b606392ba13b5dff15a854dce4f100e
|
File details
Details for the file pdf_examiner-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pdf_examiner-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e3b9b81b1719cb7e657de8134535330cefd849906e4d59a94b9d87badc8b456
|
|
| MD5 |
4910808e0d7d3182e271af80ddf6d7ce
|
|
| BLAKE2b-256 |
b34d483aa3d7bd9cb5db9e42660ac1b03651df2fe9a5baff4ad62d3426142b20
|