Skip to main content

Python interface to Apache PDFBox command-line tools.

Project description

Package Description

Provides a simple Python 3 interface to the Apache PDFBox command-line tools.

Latest Version

Requirements

Aside from Python 3 and those packages specified in setup.py, python-pdfbox requires java to be present in the system path.

Some users have reported issues on MacOS with certain versions of Java. If you encounter such issues, try a recent release of OpenJDK (14 or later).

Installation

The package may be installed as follows:

pip install python-pdfbox

One may specify the location of the PDFBox jar file via the PDFBOX environmental variable. If not set, python-pdfbox looks for the jar file in the platform-specific user cache directory and automatically downloads the latest available version below 3.0.0 and caches it if not present.

Usage

The interface currently exposes only several features in PDFBox (text extraction, conversion to images, extraction of images):

import pdfbox
p = pdfbox.PDFBox()
p.extract_text('/path/to/my_file.pdf')   # writes text to /path/to/my_file.txt
p.pdf_to_images('/path/to/my_file.pdf')  # writes images to /path/to/my_file1.jpg, /path/to/my_file2.jpg, etc.
p.extract_images('/path/to/my_file.pdf') # writes images to /path/to/my_file-1.png, /path/to/my_file-2.png, etc.

Notes

Owing to a change in command line interface, python-pdfbox cannot currently use PDFBox 3.0.0.

Development

The latest release of the package may be obtained from GitHub.

Author

See the included AUTHORS.rst file for more information.

License

This software is licensed under the Apache 2.0 License. See the included LICENSE.rst file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-pdfbox-0.1.8.1.tar.gz (82.9 kB view details)

Uploaded Source

Built Distribution

python_pdfbox-0.1.8.1-py2-none-any.whl (6.2 kB view details)

Uploaded Python 2

File details

Details for the file python-pdfbox-0.1.8.1.tar.gz.

File metadata

  • Download URL: python-pdfbox-0.1.8.1.tar.gz
  • Upload date:
  • Size: 82.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for python-pdfbox-0.1.8.1.tar.gz
Algorithm Hash digest
SHA256 93c681c76ea875faf2283caa52dbd4b0974fb16f16402f548fae7366932e6bf8
MD5 2e84df53dba9ad394acbe6b4f287b096
BLAKE2b-256 6c3accb2d4539887cdde765e515b0b88bfb4b67d048d2e416eeb39aebc9dd620

See more details on using hashes here.

File details

Details for the file python_pdfbox-0.1.8.1-py2-none-any.whl.

File metadata

  • Download URL: python_pdfbox-0.1.8.1-py2-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for python_pdfbox-0.1.8.1-py2-none-any.whl
Algorithm Hash digest
SHA256 6a59041aa8ee9f2eacd01f7fdb949873849a5a8d33cd84e4802e1c6baa5f6be2
MD5 fe921160a60dd9cd635860e76cf6e7da
BLAKE2b-256 536e041c9677b4f8f879e6a40201a525cf1435d56364fb89ad7edd9d530fc8a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page