Skip to main content

Provides an overview of the inner file structure of a PDF and extracts /URI and /JS data.

Project description

Description

Provides an overview of the inner file structure of a PDF and extracts /URI and /JS data.

Configuration

options.json contains the rules for searching the PDF document. If you want to have additional information, just add a new object to this file. The following provides an example of an object:

[
    {
        "name": "obj",                  // The name of the entity which shall
                                        // be found. Just acts as a display name
        "type": "tag",                  // There are 5 different types:
                                        //      - metadata 
                                        //      - action
                                        //      - tag
                                        //      - code
                                        //      - embedded
                                        // These types take care of the order within
                                        // the output (see example.txt)
        "action": "count",              // There are 2 different actions:
                                        //      - count (Counts all regex matches)
                                        //      - value (Provides the value of a 
                                        //               regex)
        "regexes": ["(?<!end)obj"]      // The regex to find what you need. Make
                                        // sure, it matches the selected action
    }
]

Usage

From command line:

python -m pdf_investigator [-h] --path PATH

Option Short Type Default Description
--path -p String - Path to the PDF directory

Example

python -m pdf_investigator -p "path/to/pdf" > result.txt

You can find the following result here:

################################################################################

PDF Examiner by 5f0
Provides an overview of the inner file structure of a PDF

Current working directory: /path/to/pdfs
Investigated PDFs in: ./sample-files

Total numbers of examined PDFs: 1

Datetime: 01/01/1980 11:12:13

################################################################################

Examined file: ./sample-file/sample.pdf

     MD5 Hash: 851acee02bd8d002e3b9af184d0c8959
  SHA256 Hash: f723638db6e763cf4ccadad38a3d38a02d9ecab95dab1f0aebf00e801991b5f92

--> Version: %PDF-1.5

--> Metadata:

/Author             : ['5f0']
/Creator            : ['LaTeX with hyperref package']
/Producer           : ['pdfTeX-1.40.16']
/CreationDate       : ["D:19700101193102+02\\'00\\'"]
/ModDate            : ["D:19700101193102+02\\'00\\'"]


--> Tags:

obj                 : 6
endobj              : 6
stream              : 5
endstream           : 5
xref                : 0
startxref           : 1
trailer             : 0

--> Actions:

/Action             : 1
/URI                : 2
/URI values         : ['http://example.com/']
/OpenAction         : 1
/Named              : 0
/Launch             : 0
/AcroForm           : 0

---> Code:

/JavaScript         : 2
/JS                 : 1
/JS values          : ['var v = app.viewerVersion;']

---> Embedded:

/RichMedia          : 0
/EmbeddedFile       : 0
/Encrypt            : 2

################################################################################

Execution Time: 0.048780 sec

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_examiner-1.0.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_examiner-1.0.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf_examiner-1.0.0.tar.gz.

File metadata

  • Download URL: pdf_examiner-1.0.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for pdf_examiner-1.0.0.tar.gz
Algorithm Hash digest
SHA256 228d2a456f06aae7fc0af6811e25a3e6e3ae020c02efeedafd989cb0452fda96
MD5 9d574999e697ea4a700d1f8aa36f2f6b
BLAKE2b-256 d8c640beebc57f4e5e2312bad094ba079b606392ba13b5dff15a854dce4f100e

See more details on using hashes here.

File details

Details for the file pdf_examiner-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pdf_examiner-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for pdf_examiner-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e3b9b81b1719cb7e657de8134535330cefd849906e4d59a94b9d87badc8b456
MD5 4910808e0d7d3182e271af80ddf6d7ce
BLAKE2b-256 b34d483aa3d7bd9cb5db9e42660ac1b03651df2fe9a5baff4ad62d3426142b20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page