Skip to main content

A library for processing PDFs with Florence

Project description

PDF Processing Florence

PDF Processing Florence is a Python library designed to streamline the process of converting, analyzing, and extracting information from PDF documents

Features

  • PDF to JPG Conversion: Easily convert PDF files to high-quality JPG images.
  • AI-Powered Document Analysis: Utilize the Florence model to analyze document structure and content.
  • Intelligent Text Extraction: Extract text from PDFs while preserving document structure and layout information.

Installation

You can install PDF Processing Florence using pip:

pip install pdf_processing_florence

Usage

Here are some basic examples of how to use PDF Processing Florence:

Converting PDF to JPG

from pdf_processing_florence import convertpdf_jpg

input_directory = "/path/to/your/pdfs"
convertpdf_jpg(input_directory)

Applying Florence AI Analysis

from pdf_processing_florence import apply_florence

image_directory = "/path/to/your/images"
output_file = "/path/to/output/annotations.json"
checkpoint = "/path/to/florence/model"

apply_florence(image_directory, output_file, checkpoint)

Extracting Text with Structure

from pdf_processing_florence import extract_text

document_folder = "/path/to/your/pdfs"
coco_file = "/path/to/annotations.json"
output_json_file = "/path/to/output/extracted_text.json"

extract_text(document_folder, coco_file, output_json_file)

Requirements

  • Python 3.6+
  • pdf2image
  • Pillow
  • PyTorch
  • Transformers
  • PyMuPDF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_processing_florence-0.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

pdf_processing_florence-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file pdf_processing_florence-0.1.0.tar.gz.

File metadata

File hashes

Hashes for pdf_processing_florence-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a9e29a881be6f2ddbb0e8010b333f5aad890c35f76a4a618be36276db5fe5203
MD5 6952a0097588e6130dffa76be63caaf8
BLAKE2b-256 26e338f1359d94f732134b4ba64a7145a60aa5d068ba5524de3ad31fdf938947

See more details on using hashes here.

File details

Details for the file pdf_processing_florence-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_processing_florence-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 807f6e632d46fd723bf288dff7a221c5b0c0be4536c51bd5412fac7d5f6b7f34
MD5 cfa5e600f9ff26da85deabe46077794c
BLAKE2b-256 55b5aca1c35ae993ad5db1e7a51dc8a498ffe15add4a4194fe2d31516f2a29fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page