Skip to main content

This package splits PDF and TIFF files into separate PNGs and extracts text from input files.

Project description

@axa-fr/axa-fr-splitter

PyPI PyPI - License PyPI - Wheel

Tests python: 3.10 (shields.io) python: 3.11 (shields.io) python: 3.12 (shields.io)

About

The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.

Quick Start

pip install axa-fr-splitter
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler


def create_file_handler() -> FileHandler:
    """Factory to create customized file handler"""

    # Create File Handler
    file_handler = FileHandler()

    # Create pdf Handler
    pdf_handler = FitzPdfHandler()

    # Create tiff Handler
    tiff_handler = TifHandler()

    # Register PDF Handler
    file_handler.register_converter(
        pdf_handler,
        extensions=['.pdf'],
        mime_types=['application/pdf']
    )

    # Register tiff Handler
    file_handler.register_converter(
        tiff_handler,
        extensions=['.tif', '.tiff'],
        mime_types=['image/tiff']
    )

    return file_handler


def main(filepath, output_path):
    file_handler = create_file_handler()
    output_path = Path(output_path)

    for file_or_exception in file_handler.split_document(filepath):
        file = file_or_exception.unwrap()

        print(file.metadata)
        # {
        #     'original_filename': 'specimen.tiff',
        #     'page_number': 1,
        #     'total_pages': 4,
        #     'width': 1554,
        #     'height': 2200,
        #     'resized_ratio': 0.9405728943993159
        # }

        # Export File file bytes:
        export_path = output_path.joinpath(file.relative_path)
        export_path.write_bytes(file.file_bytes)

if __name__ == '__main__':
    main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)

You can use the match statement to handle the exceptions in a different way:

from returns.result import Failure, Success

...

def main(filepath, output_path):
    file_handler = create_file_handler()
    output_path = Path(output_path)

    for file_or_exception in file_handler.split_document(filepath):
        match file_or_exception:
            case Success(file):
                print(file.metadata)
                export_path = output_path.joinpath(file.relative_path)
                export_path.write_bytes(file.file_bytes)
            case Failure(exception):
                # Handle Exception ...
                raise exception

Contribute

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axa_fr_splitter-1.0.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

axa_fr_splitter-1.0.0-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file axa_fr_splitter-1.0.0.tar.gz.

File metadata

  • Download URL: axa_fr_splitter-1.0.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for axa_fr_splitter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1b7dc685de4a769207d62d8d0309df8996cd99155eea1c4702e2c9636c5162a9
MD5 dfe45bae393fb49db5aaa5274b3b4ba7
BLAKE2b-256 48247db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505

See more details on using hashes here.

File details

Details for the file axa_fr_splitter-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for axa_fr_splitter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 341f8bca5ea8792c6bb2aad704f583bc38f0eb122b5bd6711db53ee499c51e7c
MD5 81962d8cafc11b8755690636971e7021
BLAKE2b-256 245aa5f25af013d4d586344544bf1ad1ac22edb346e94097bf591eee32d7184e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page