Skip to main content

Extract structured text from pdf files.

Project description

leaf-focus

Extract structured text from pdf files.

Install

Install from PyPI using pip:

pip install leaf-focus

PyPI PyPI - Python Version GitHub Workflow Status (branch)

Download the Xpdf command line tools and extract the executable files.

Provide the directory containing the executable files as --exe-dir.

Usage

usage: leaf-focus [-h] [--version] --exe-dir EXE_DIR [--page-images] [--ocr]
                  [--first FIRST] [--last LAST]
                  [--log-level {debug,info,warning,error,critical}]
                  input_pdf output_dir

Extract structured text from a pdf file.

positional arguments:
  input_pdf             path to the pdf file to read
  output_dir            path to the directory to save the extracted text files

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --exe-dir EXE_DIR     path to the directory containing xpdf executable files
  --page-images         save each page of the pdf as a separate image
  --ocr                 run optical character recognition on each page of the
                        pdf
  --first FIRST         the first pdf page to process
  --last LAST           the last pdf page to process
  --log-level {debug,info,warning,error,critical}
                        the log level: debug, info, warning, error, critical

Examples

# Extract the pdf information and embedded text.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages

# Extract the pdf information, embedded text, an image of each page, and Optical Character Recognition results of each page.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages --ocr

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leaf-focus-0.6.2.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

leaf_focus-0.6.2-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file leaf-focus-0.6.2.tar.gz.

File metadata

  • Download URL: leaf-focus-0.6.2.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for leaf-focus-0.6.2.tar.gz
Algorithm Hash digest
SHA256 f0b0be650e761626836cdd74ccc8c32c8c96a4e380d618bb506d3e71e719079c
MD5 c0e255bf756b7d3d8255a9134466a24e
BLAKE2b-256 eeddace4b960dd401e6109bfed1c4ca4a981c48c5f5fc4b61af7c9fbfd36c32c

See more details on using hashes here.

File details

Details for the file leaf_focus-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: leaf_focus-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for leaf_focus-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a4e37ffdbecdc6ea3992901ea7ac4194b8413a0e14d98f242d97a5a7af6eedd
MD5 58082d7d2e89cd65297798dba08d30f5
BLAKE2b-256 b35862ec79fcdaf85093217f6cd07dd3e4602cd955e2f6cc950b60b2b0298fc2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page