Skip to main content

Python interface to Apache PDFBox command-line tools.

Project description

Package Description

Provides a simple Python 3 interface to the Apache PDFBox command-line tools.

Latest Version

Requirements

Aside from Python 3 and those packages specified in setup.py, python-pdfbox requires java to be present in the system path.

Installation

The package may be installed as follows:

pip install python-pdfbox

One may specify the location of the PDFBox jar file via the PDFBOX environmental variable. If not set, python-pdfbox looks for the jar file in the platform-specific user cache directory and automatically downloads and caches it if not present.

Usage

The interface currently exposes only several features in PDFBox (text extraction, conversion to images, extraction of images):

import pdfbox
p = pdfbox.PDFBox()
p.extract_text('/path/to/my_file.pdf')   # writes text to /path/to/my_file.txt
p.pdf_to_images('/path/to/my_file.pdf')  # writes images to /path/to/my_file1.jpg, /path/to/my_file2.jpg, etc.
p.extract_images('/path/to/my_file.pdf') # writes images to /path/to/my_file-1.png, /path/to/my_file-2.png, etc.

Development

The latest release of the package may be obtained from GitHub.

Author

See the included AUTHORS.rst file for more information.

License

This software is licensed under the Apache 2.0 License. See the included LICENSE.rst file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-pdfbox-0.1.8.tar.gz (81.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_pdfbox-0.1.8-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file python-pdfbox-0.1.8.tar.gz.

File metadata

  • Download URL: python-pdfbox-0.1.8.tar.gz
  • Upload date:
  • Size: 81.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200325 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for python-pdfbox-0.1.8.tar.gz
Algorithm Hash digest
SHA256 a73619c389c7742747d456ba44bd7acb5b37aaaa3fbf683bf77006807764be44
MD5 3decc16282aad09b065e19cdaef806e6
BLAKE2b-256 2cd61583b984f0667e57b9b38763079a1c6527f1afb189c60e3b42a639bfd472

See more details on using hashes here.

File details

Details for the file python_pdfbox-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: python_pdfbox-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200325 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for python_pdfbox-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 23f3b43cf96cb66709a1a2b8cdcae98d73a65cc27363bc3fe3bcab1a9f2e79f2
MD5 8a3f47bc420f2c789675d36ca8d100c2
BLAKE2b-256 53a5400b2507284a9e68f983697245981a174aae1260552e2a0091bef070d3cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page