Skip to main content

Utilities for Azure document processing

Project description

azure-doc-processing

azure-doc-processing is a Python library designed to simplify and standardize the use of common Azure services in document-processing workflows.
It reduces repeated boilerplate code across projects and provides a clean, consistent, DRY development experience.

The library offers convenient wrappers and utilities around Azure SDK components, focusing primarily on document processing.


Features

Supported Azure Services

Service Module
Azure Storage (Blob Storage) azure_doc_processing.blob_storage
Azure Document Intelligence azure_doc_processing.document_intelligence
Azure Key Vault azure_doc_processing.keyvault
Azure OpenAI azure_doc_processing.openai_service
Azure Storage (Table Storage) azure_doc_processing.table_storage

Additional Functionality

Standardized Logger

from azure_doc_processing.logger import Logger

logger = Logger(__name__)

Document Utilities

Generic helper functions for document-processing tasks are available under:

import azure_doc_processing.utils as utils

Installation

Once published to PyPI:

pip install azure-doc-processing

Or with Poetry:

poetry add azure-doc-processing

Quick Example

Using key-based authentication:

from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger

logger = Logger(__name__)

datalake = AzureDataLake(
    os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
    os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
)

blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")

Using DefaultAzureCredential authentication:

from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger

logger = Logger(__name__)

datalake = AzureDataLake(
    os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
)

blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")

Development

This repository uses:

  • pyenv for Python version management\
  • Poetry 1.8.5 for dependency & environment management\
  • poethepoet for task automation\
  • pytest for testing

Setup

Install all dependencies:

poetry install

Development Dependencies

Add development-only packages with:

poetry add <package> --group dev

Available Tasks

Using poethepoet:

Command Description


poe test Run the full test suite poe format Format code (autoflake, black, isort)

Before committing changes, run:

poe format
poe test

Publishing the Package

This project includes automated publishing scripts.

Prerequisites

Make sure you have:

  • An API token for TestPyPI and/or PyPI

  • A valid ~/.pypirc file, for example:

    [testpypi]
    username = __token__
    password = pypi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
    [pypi]
    username = __token__
    password = pypi-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
    

Publish to TestPyPI (dry run)

Use this when testing new releases. This publishes the package under a temporary test name and performs an install + import smoke test.

poe publish-testpypi

Publish to PyPi (real release)

Use this when releasing an official version. Make sure you’ve bumped the version in pyproject.toml.

poe publish-pypi

Publishing will fail if the version already exists on PyPI. Real releases should use unique, semantic version increments.


License

This project is licensed under the Apache License 2.0.
See the LICENSE and NOTICE files for details.


Contributing

Contributions are welcome!
Please open an issue or submit a pull request via GitHub.


Acknowledgements

Maintained by Verdel Digitaal Partner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_doc_processing-1.1.1.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

azure_doc_processing-1.1.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file azure_doc_processing-1.1.1.tar.gz.

File metadata

  • Download URL: azure_doc_processing-1.1.1.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for azure_doc_processing-1.1.1.tar.gz
Algorithm Hash digest
SHA256 942cf2a68a1c3ed5b2e5d91f07b6d872261dedac4d324e8e63d067133efbb3a0
MD5 49341966a5b8a39b1f2fdabd1ae1e638
BLAKE2b-256 2e94b95fb3435bc723b55d9559483d6ad639efa7a4561845d71f3a7b2abf1c2e

See more details on using hashes here.

File details

Details for the file azure_doc_processing-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for azure_doc_processing-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf4f40a17882da51a50f9847d688eaf0d1c020a872b51f3012681a60d0876aa6
MD5 21872db2c0f1b7b402100f34888fc807
BLAKE2b-256 914c6cd76b8193cc8d6dc709e1e765d1506113d7b0b68ed65b659d6cab243178

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page