This package splits PDF and TIFF files into separate PNGs and extracts text from input files.
Project description
@axa-fr/axa-fr-splitter
About
The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.
Quick Start
pip install axa-fr-splitter
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler
def create_file_handler() -> FileHandler:
"""Factory to create customized file handler"""
# Create File Handler
file_handler = FileHandler()
# Create pdf Handler
pdf_handler = FitzPdfHandler()
# Create tiff Handler
tiff_handler = TifHandler()
# Register PDF Handler
file_handler.register_converter(
pdf_handler,
extensions=['.pdf'],
mime_types=['application/pdf']
)
# Register tiff Handler
file_handler.register_converter(
tiff_handler,
extensions=['.tif', '.tiff'],
mime_types=['image/tiff']
)
return file_handler
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
file = file_or_exception.unwrap()
print(file.metadata)
# {
# 'original_filename': 'specimen.tiff',
# 'page_number': 1,
# 'total_pages': 4,
# 'width': 1554,
# 'height': 2200,
# 'resized_ratio': 0.9405728943993159
# }
# Export File file bytes:
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
if __name__ == '__main__':
main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)
You can use the match
statement to handle the exceptions in a different way:
from returns.result import Failure, Success
...
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
match file_or_exception:
case Success(file):
print(file.metadata)
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
case Failure(exception):
# Handle Exception ...
raise exception
Contribute
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for axa_fr_splitter-1.0.0.dev2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fdc2aa924d89384a1ed4230e5f74a727a59495c644abc16b9048ebcd66c1e8a |
|
MD5 | 385febfc4f71045fe2e250ae25b6575c |
|
BLAKE2b-256 | 7c7cd2c27290b15e3b40ffcef9ad0c5fe41c21372fe2c609d90ed6b0dbcdca62 |
Close
Hashes for axa_fr_splitter-1.0.0.dev2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 984d1f8f865e6f3f09d34acd45418c6b8e48737c226ceed897a3e93e83dde057 |
|
MD5 | 4aca078afb030902d042d4c969447dbd |
|
BLAKE2b-256 | 305c909e13211f9a4c56505c159818aed302e41db35193960967586c9157a9ea |