This package splits PDF and TIFF files into separate PNGs and extracts text from input files.
Project description
@axa-fr/axa-fr-splitter
About
The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.
Quick Start
pip install axa-fr-splitter
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler
def create_file_handler() -> FileHandler:
"""Factory to create customized file handler"""
# Create File Handler
file_handler = FileHandler()
# Create pdf Handler
pdf_handler = FitzPdfHandler()
# Create tiff Handler
tiff_handler = TifHandler()
# Register PDF Handler
file_handler.register_converter(
pdf_handler,
extensions=['.pdf'],
mime_types=['application/pdf']
)
# Register tiff Handler
file_handler.register_converter(
tiff_handler,
extensions=['.tif', '.tiff'],
mime_types=['image/tiff']
)
return file_handler
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
file = file_or_exception.unwrap()
print(file.metadata)
# {
# 'original_filename': 'specimen.tiff',
# 'page_number': 1,
# 'total_pages': 4,
# 'width': 1554,
# 'height': 2200,
# 'resized_ratio': 0.9405728943993159
# }
# Export File file bytes:
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
if __name__ == '__main__':
main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)
You can use the match
statement to handle the exceptions in a different way:
from returns.result import Failure, Success
...
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
match file_or_exception:
case Success(file):
print(file.metadata)
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
case Failure(exception):
# Handle Exception ...
raise exception
Contribute
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
axa_fr_splitter-1.0.0.tar.gz
(1.2 MB
view details)
Built Distribution
File details
Details for the file axa_fr_splitter-1.0.0.tar.gz
.
File metadata
- Download URL: axa_fr_splitter-1.0.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b7dc685de4a769207d62d8d0309df8996cd99155eea1c4702e2c9636c5162a9 |
|
MD5 | dfe45bae393fb49db5aaa5274b3b4ba7 |
|
BLAKE2b-256 | 48247db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505 |
File details
Details for the file axa_fr_splitter-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: axa_fr_splitter-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 341f8bca5ea8792c6bb2aad704f583bc38f0eb122b5bd6711db53ee499c51e7c |
|
MD5 | 81962d8cafc11b8755690636971e7021 |
|
BLAKE2b-256 | 245aa5f25af013d4d586344544bf1ad1ac22edb346e94097bf591eee32d7184e |