A library for processing PDFs with Florence
Project description
PDF Processing Florence
PDF Processing Florence is a Python library designed to streamline the process of converting, analyzing, and extracting information from PDF documents
Features
- PDF to JPG Conversion: Easily convert PDF files to high-quality JPG images.
- AI-Powered Document Analysis: Utilize the Florence model to analyze document structure and content.
- Intelligent Text Extraction: Extract text from PDFs while preserving document structure and layout information.
Installation
You can install PDF Processing Florence using pip:
pip install pdf_processing_florence
Usage
Here are some basic examples of how to use PDF Processing Florence:
Converting PDF to JPG
from pdf_processing_florence import convertpdf_jpg
input_directory = "/path/to/your/pdfs"
convertpdf_jpg(input_directory)
Applying Florence AI Analysis
from pdf_processing_florence import apply_florence
image_directory = "/path/to/your/images"
output_file = "/path/to/output/annotations.json"
checkpoint = "/path/to/florence/model"
apply_florence(image_directory, output_file, checkpoint)
Extracting Text with Structure
from pdf_processing_florence import extract_text
document_folder = "/path/to/your/pdfs"
coco_file = "/path/to/annotations.json"
output_json_file = "/path/to/output/extracted_text.json"
extract_text(document_folder, coco_file, output_json_file)
Requirements
- Python 3.6+
- pdf2image
- Pillow
- PyTorch
- Transformers
- PyMuPDF
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf_processing_florence-0.1.0.tar.gz
.
File metadata
- Download URL: pdf_processing_florence-0.1.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9e29a881be6f2ddbb0e8010b333f5aad890c35f76a4a618be36276db5fe5203 |
|
MD5 | 6952a0097588e6130dffa76be63caaf8 |
|
BLAKE2b-256 | 26e338f1359d94f732134b4ba64a7145a60aa5d068ba5524de3ad31fdf938947 |
File details
Details for the file pdf_processing_florence-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pdf_processing_florence-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 807f6e632d46fd723bf288dff7a221c5b0c0be4536c51bd5412fac7d5f6b7f34 |
|
MD5 | cfa5e600f9ff26da85deabe46077794c |
|
BLAKE2b-256 | 55b5aca1c35ae993ad5db1e7a51dc8a498ffe15add4a4194fe2d31516f2a29fb |