This package splits PDF and TIFF files into separate PNGs and extracts text from input files.
Project description
@axa-fr/axa-fr-splitter
About
The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.
Quick Start
pip install axa-fr-splitter
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler
def create_file_handler() -> FileHandler:
"""Factory to create customized file handler"""
# Create File Handler
file_handler = FileHandler()
# Create pdf Handler
pdf_handler = FitzPdfHandler()
# Create tiff Handler
tiff_handler = TifHandler()
# Register PDF Handler
file_handler.register_converter(
pdf_handler,
extensions=['.pdf'],
mime_types=['application/pdf']
)
# Register tiff Handler
file_handler.register_converter(
tiff_handler,
extensions=['.tif', '.tiff'],
mime_types=['image/tiff']
)
return file_handler
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
file = file_or_exception.unwrap()
print(file.metadata)
# {
# 'original_filename': 'specimen.tiff',
# 'page_number': 1,
# 'total_pages': 4,
# 'width': 1554,
# 'height': 2200,
# 'resized_ratio': 0.9405728943993159
# }
# Export File file bytes:
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
if __name__ == '__main__':
main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)
You can use the match
statement to handle the exceptions in a different way:
from returns.result import Failure, Success
...
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
match file_or_exception:
case Success(file):
print(file.metadata)
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
case Failure(exception):
# Handle Exception ...
raise exception
Contribute
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for axa_fr_splitter-1.0.0.dev1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddeffddf87ee8e5b1b9cffdce755d74841c6ef598bc93971de9b8eaec9d46c7b |
|
MD5 | bb3a937bc82800dee3b644ac8c424cb7 |
|
BLAKE2b-256 | 326cd481b73e38760abcc85e4b1aed54d571759df131904570518daaa18cc51f |
Close
Hashes for axa_fr_splitter-1.0.0.dev1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72eac2270c3935b02650494a0aa4375034bf4036003d58d0ee6eac382f8c2807 |
|
MD5 | cb331531fff847a07ea988d3721b1ec1 |
|
BLAKE2b-256 | 04bb7328156c77fb67651e8c427d7f66663b4e6987126f3cc9b77c725ba8969d |