Skip to main content

Utilities for Azure document processing

Project description

azure-doc-processing

azure-doc-processing is a Python library designed to simplify and standardize the use of common Azure services in document-processing workflows.
It reduces repeated boilerplate code across projects and provides a clean, consistent, DRY development experience.

The library offers convenient wrappers and utilities around Azure SDK components, focusing primarily on document processing.


Features

Supported Azure Services

Service Module
Azure Storage (Blob Storage) azure_doc_processing.blob_storage
Azure Document Intelligence azure_doc_processing.document_intelligence
Azure Key Vault azure_doc_processing.keyvault
Azure OpenAI azure_doc_processing.openai_service
Azure Storage (Table Storage) azure_doc_processing.table_storage

Additional Functionality

Standardized Logger

from azure_doc_processing.logger import Logger

logger = Logger(__name__)

Document Utilities

Generic helper functions for document-processing tasks are available under:

import azure_doc_processing.utils as utils

Installation

Once published to PyPI:

pip install azure-doc-processing

Or with Poetry:

poetry add azure-doc-processing

Quick Example

Using key-based authentication:

from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger

logger = Logger(__name__)

datalake = AzureDataLake(
    os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
    os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
)

blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")

Using DefaultAzureCredential authentication:

from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger

logger = Logger(__name__)

datalake = AzureDataLake(
    os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
)

blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")

Development

This repository uses:

  • pyenv for Python version management\
  • Poetry 1.8.5 for dependency & environment management\
  • poethepoet for task automation\
  • pytest for testing

Setup

Install all dependencies:

poetry install

Development Dependencies

Add development-only packages with:

poetry add <package> --group dev

Available Tasks

Using poethepoet:

Command Description


poe test Run the full test suite poe format Format code (autoflake, black, isort)

Before committing changes, run:

poe format
poe test

Publishing the Package

This project includes automated publishing scripts.

Prerequisites

Make sure you have:

  • An API token for TestPyPI and/or PyPI

  • A valid ~/.pypirc file, for example:

    [testpypi]
    username = __token__
    password = pypi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
    [pypi]
    username = __token__
    password = pypi-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
    

Publish to TestPyPI (dry run)

Use this when testing new releases. This publishes the package under a temporary test name and performs an install + import smoke test.

poe publish-testpypi

Publish to PyPi (real release)

Use this when releasing an official version. Make sure you’ve bumped the version in pyproject.toml.

poe publish-pypi

Publishing will fail if the version already exists on PyPI. Real releases should use unique, semantic version increments.


License

This project is licensed under the Apache License 2.0.
See the LICENSE and NOTICE files for details.


Contributing

Contributions are welcome!
Please open an issue or submit a pull request via GitHub.


Acknowledgements

Maintained by Verdel Digitaal Partner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_doc_processing-1.1.2.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

azure_doc_processing-1.1.2-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file azure_doc_processing-1.1.2.tar.gz.

File metadata

  • Download URL: azure_doc_processing-1.1.2.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for azure_doc_processing-1.1.2.tar.gz
Algorithm Hash digest
SHA256 3dc41fb7ef396fde181e485cd984ce9e26f5312a692bda62131ba785ad479c91
MD5 82e2fdfbf748f41b6551404a5db16880
BLAKE2b-256 2904ba49b21652a9f87c2fb60b5fcb9a33bc16291a1d69452b0f1e1270e9ab76

See more details on using hashes here.

File details

Details for the file azure_doc_processing-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for azure_doc_processing-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d0d00dd395980da98dc3bcd518743045ee064dfff2a1fabf1239a5fb6b4014cf
MD5 5d3853acbc33cf712750c5242c04d241
BLAKE2b-256 47e4ba2fc1bee8975a69d3695ac7dfa51bc75ba837be168dd15cd2902bc1f5fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page