Skip to main content

Crop, Rotate, and extract text from your PDFs so you can delete them

Project description

Delete-Your-PDF

Delete your PDF is a set of tools to export information from your PDFs so you can delete them.

Image files can be taken in as both base64 strings or BytesIO objects

Pip Library: https://pypi.org/project/delete-your-pdf/

Pip Repo: https://github.com/darefail/Delete-Your-PDF

Live Demo

Live Demo: https://pdf.darefail.com

Demo Opensource Repo: https://github.com/DareFail/AI-Video-Boilerplate-Pro

Installation

pip install delete-your-pdf

How to use

countPdfPages: Counts the number of pages in a PDF and returns an int

from deleteYourPDF import countPdfPages

numberOfPages = countPdfPages(file="PDF_FILE_HERE")

pdfToImagePages: Convert PDF to a list of pages that are PNG images as a base64 strings

from deleteYourPDF import pdfToImagePages

# Return a list containing all pages in order as images
listOfImagePages = pdfToImagePages(file="PDF_FILE_HERE")

# Return a list containing only an image of page 7
listOfImagePages = pdfToImagePages(file="PDF_FILE_HERE", page_number=7)

imageWidthHeight: get the width and height of an image as a dictionary in pixels {width: 100, height: 100}

from deleteYourPDF import imageWidthHeight

image_dimensions = imageWidthHeight(file="IMAGE_FILE_HERE")

width = image_dimensions["width"]
height = image_dimensions["height"]

cropRotateImage: Crop and rotate an image and return a PNG image as a base64 string

from deleteYourPDF import cropRotateImage

# Returns an image of the top left 100x100 square from an image and rotates it 90 degrees to the right, the new image dimensions will match the rotation
croppedAndRotatedImage = cropRotateImage(file="IMAGE_FILE_HERE", x=0, y=0, width=100, height=100, rotation_degrees=90)

# Returns an image of the top left 100x100 square from an image and keep the original image dimensions
croppedAndRotatedImage = cropRotateImage(file="IMAGE_FILE_HERE", x=0, y=0, width=100, height=100, rotation_degrees=30, expand_for_rotation=False)

imageToText_Roboflow: Convert image to text with Roboflow OCR and returns a string

from deleteYourPDF import imageToText_Roboflow

# Returns the text from a local image file
text = imageToText_Roboflow(file="IMAGE_FILE_HERE", api_key="ROBOFLOW_API_KEY_HERE")

Example 1: Convert the top 100 pixels of all pages of a PDF to a list of text

from deleteYourPDF import countPdfPages, pdfToImagePages, imageToText_Roboflow, cropRotateImage, imageWidthHeight

listOfText = []

listOfImagePages = pdfToImagePages(file="PDF_FILE_HERE")

for imagePage in listOfImagePages:
    image_dimensions = imageWidthHeight(file=imagePage)

    width = image_dimensions["width"]
    height = image_dimensions["height"]

    croppedAndRotatedImage = cropRotateImage(file=imagePage, x=0, y=0, width=width, height=100)
    listOfText.append(imageToText_Roboflow(file=croppedAndRotatedImage, api_key="ROBOFLOW_API_KEY_HERE"))

return listOfText

Example 2: Rotate a 100x100 box in the center of page 7 90 degrees to the right on a PDF and print the text

from deleteYourPDF import countPdfPages, pdfToImagePages, imageToText_Roboflow, cropRotateImage

if countPdfPages(file="PDF_FILE_HERE") > 7:
    imagePage = pdfToImagePages(file="PDF_FILE_HERE", page_number=7)
    image_dimensions = imageWidthHeight(file=imagePage)

    width = image_dimensions["width"]
    height = image_dimensions["height"]
    x = (width - 100)/2
    y = (height - 100)/2

    croppedAndRotatedImage = cropRotateImage(file=imagePage, x=x, y=y, width=100, height=100, rotation_degrees=90)
    return imageToText_Roboflow(file=croppedAndRotatedImage, api_key="ROBOFLOW_API_KEY_HERE")

Acknowledgements

Thanks to Roboflow for sponsoring this project. Get your free API key at: Roboflow

License

Distributed under the APACHE 2.0 License. See LICENSE for more information.

Contact

Twitter: @darefailed

Youtube: How to Video coming soon

Project Link: https://github.com/darefail/Delete-Your-PDF

Update Package

python3 -m build
python3 -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delete_your_pdf-1.0.3.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

delete_your_pdf-1.0.3-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file delete_your_pdf-1.0.3.tar.gz.

File metadata

  • Download URL: delete_your_pdf-1.0.3.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for delete_your_pdf-1.0.3.tar.gz
Algorithm Hash digest
SHA256 e7f25b9ef1fde1768e6aa705f0658db2d816d615478332527363ad7d436d0b86
MD5 2af418838215df5255558dffdbc074d5
BLAKE2b-256 64813c7a112e0371bd5f0fbbf03b2d9713d32571d4c3a7bf788edcb325e7bb18

See more details on using hashes here.

File details

Details for the file delete_your_pdf-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for delete_your_pdf-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 81ad63ae29f3f4d57b01cb3e4bad95c56b748da07c936ee0c04018e1a3fff0f8
MD5 339a30de8a65e3ca992eed0ff60e3067
BLAKE2b-256 7df9f6e143afc4a89d69e9b52c962af6b457734488406177500bdbc593b430be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page