Skip to main content

tools for reading and processing pdf content

Project description

## Tools for processing pdf files

This is a light-weighted library for processing pdf files in python. One of the use-cases might be the extraction of pdf-annotations for ML / NLP.

Support for

  • obtaining textual and vizual content of pdf files

  • locating positions of words

  • fetching pdf annotations

  • adding a digital layer to image-pdfs

  • re-creating a clean pdf file with annotations removed

## Dependencies

Main tools for reading pdf files are the PyPDF2 library. Non-python dependencies are

To install Poppler, see the guide in the [pdf2image readme](https://pypi.org/project/pdf2image/).

## How to

Some examples of usage are shown in the [notebook](./notebook/Demo.ipynb).

## Todo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-utils-0.1.1.tar.gz (16.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page