This package splits PDF and TIFF files into separate PNGs and extracts text from input files.
Reason this release was yanked:
Publish Error
Project description
@axa-fr/axa-fr-splitter
About
The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.
Quick Start
pip install axa-fr-splitter
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler
def create_file_handler() -> FileHandler:
"""Factory to create customized file handler"""
# Create File Handler
file_handler = FileHandler()
# Create pdf Handler
pdf_handler = FitzPdfHandler()
# Create tiff Handler
tiff_handler = TifHandler()
# Register PDF Handler
file_handler.register_converter(
pdf_handler,
extensions=['.pdf'],
mime_types=['application/pdf']
)
# Register tiff Handler
file_handler.register_converter(
tiff_handler,
extensions=['.tif', '.tiff'],
mime_types=['image/tiff']
)
return file_handler
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
file = file_or_exception.unwrap()
print(file.metadata)
# {
# 'original_filename': 'specimen.tiff',
# 'page_number': 1,
# 'total_pages': 4,
# 'width': 1554,
# 'height': 2200,
# 'resized_ratio': 0.9405728943993159
# }
# Export File file bytes:
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
if __name__ == '__main__':
main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)
You can use the match
statement to handle the exceptions in a different way:
from returns.result import Failure, Success
...
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
match file_or_exception:
case Success(file):
print(file.metadata)
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
case Failure(exception):
# Handle Exception ...
raise exception
Contribute
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for axa_fr_splitter-2.0.0.dev0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04eb4a2ff904d18d6aaab9c59be31843166fee1085a5f46cb87ccb0d2be07141 |
|
MD5 | c4514f94fc3708c3c87145ada8939ec8 |
|
BLAKE2b-256 | 49b373be4775fc11b0f17f6ef316ee428bd112d26ce52c33d76b3b12e27c9dd0 |
Close
Hashes for axa_fr_splitter-2.0.0.dev0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ebd34ab103be2345f9b65eef0f105e3e7d89e875053b4be00e5120ff22b25af |
|
MD5 | f4113d9bdb5436c0bf8287b518c53928 |
|
BLAKE2b-256 | fbb629d1be90110af0755c926053e36bdce6ba504aa787e6917b4de6e8202bc7 |