Skip to main content

Extract high-quality images from PDF files while preserving metadata

Project description

PDF Image Extractor

A tool to extract high-quality images from PDF files while preserving metadata and positioning information.

Features

  • Extracts images in their original quality without recompression
  • Preserves image metadata including DPI and positioning
  • Detects and skips duplicate images
  • Generates detailed JSON metadata file
  • Sorts images by their position on the page

Installation

pip install pdfimageextractor

Usage

pdfextractimages <PDF_FILE> [OUTPUT_FOLDER]

Arguments:

  • PDF_FILE: Path to the PDF file to process
  • OUTPUT_FOLDER: Optional directory to save extracted images (defaults to PDF_FILE_images)

Output

The tool creates:

  • Original quality images extracted from the PDF
  • A image_metadata.json file containing detailed information about each image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfimageextractor-0.1.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfimageextractor-0.1.0-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file pdfimageextractor-0.1.0.tar.gz.

File metadata

  • Download URL: pdfimageextractor-0.1.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for pdfimageextractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fea11ffa092e2d84be93b6e999730965164b3f23c39d0d09b106aa6c33f706a8
MD5 99aebd16d31b3a8b5d1eba7913b71755
BLAKE2b-256 2c6636e414c931c656aba25b392b9c074e6177f0344a66c73fabd958b6c4e3cc

See more details on using hashes here.

File details

Details for the file pdfimageextractor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdfimageextractor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f561be841dda15b5fbca729302b091e417a61b28e91d89def14e51f71da5457f
MD5 03e64dd1a9b160f7220aa26ba76fa818
BLAKE2b-256 e0bcfcebfb373723a99e6cf4b4cc88b54b9db104b2ead34d5b837c1f1e48e3e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page