Skip to main content

Utilities for Azure document processing

Project description

azure-doc-processing

azure-doc-processing is a Python library designed to simplify and standardize the use of common Azure services in document-processing workflows.
It reduces repeated boilerplate code across projects and provides a clean, consistent, DRY development experience.

The library offers convenient wrappers and utilities around Azure SDK components, focusing primarily on document processing.


Features

Supported Azure Services

Service Module
Azure Storage (Blob Storage) azure_doc_processing.blob_storage
Azure Document Intelligence azure_doc_processing.document_intelligence
Azure Key Vault azure_doc_processing.keyvault
Azure OpenAI azure_doc_processing.openai_service
Azure Storage (Table Storage) azure_doc_processing.table_storage

Additional Functionality

Standardized Logger

from azure_doc_processing.logger import Logger

logger = Logger(__name__)

Document Utilities

Generic helper functions for document-processing tasks are available under:

import azure_doc_processing.utils as utils

Installation

Once published to PyPI:

pip install azure-doc-processing

Or with Poetry:

poetry add azure-doc-processing

Quick Example

Using key-based authentication:

from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger

logger = Logger(__name__)

datalake = AzureDataLake(
    os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
    os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
)

blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")

Using DefaultAzureCredential authentication:

from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger

logger = Logger(__name__)

datalake = AzureDataLake(
    os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
)

blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")

Development

This repository uses:

  • pyenv for Python version management\
  • Poetry 1.8.5 for dependency & environment management\
  • poethepoet for task automation\
  • pytest for testing

Setup

Install all dependencies:

poetry install

Development Dependencies

Add development-only packages with:

poetry add <package> --group dev

Available Tasks

Using poethepoet:

Command Description


poe test Run the full test suite poe format Format code (autoflake, black, isort)

Before committing changes, run:

poe format
poe test

Publishing the Package

This project includes automated publishing scripts.

Prerequisites

Make sure you have:

  • An API token for TestPyPI and/or PyPI

  • A valid ~/.pypirc file, for example:

    [testpypi]
    username = __token__
    password = pypi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
    [pypi]
    username = __token__
    password = pypi-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
    

Publish to TestPyPI (dry run)

Use this when testing new releases. This publishes the package under a temporary test name and performs an install + import smoke test.

poe publish-testpypi

Publish to PyPi (real release)

Use this when releasing an official version. Make sure you’ve bumped the version in pyproject.toml.

poe publish-pypi

Publishing will fail if the version already exists on PyPI. Real releases should use unique, semantic version increments.


License

This project is licensed under the Apache License 2.0.
See the LICENSE and NOTICE files for details.


Contributing

Contributions are welcome!
Please open an issue or submit a pull request via GitHub.


Acknowledgements

Maintained by Verdel Digitaal Partner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_doc_processing-1.2.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

azure_doc_processing-1.2.1-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file azure_doc_processing-1.2.1.tar.gz.

File metadata

  • Download URL: azure_doc_processing-1.2.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for azure_doc_processing-1.2.1.tar.gz
Algorithm Hash digest
SHA256 a1261a28c3d23664b868f541f866f8851bd719deaf8e58a00f875298473605ab
MD5 84c414cde49bb8c9f28320b33de3d2c2
BLAKE2b-256 70de943b56d661b46c36f2d0b5036bcac9d42824b77b8657e137dec2032fd15c

See more details on using hashes here.

File details

Details for the file azure_doc_processing-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for azure_doc_processing-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb80e08dfd800ebed0828ab7eacb31d33fc228d23597b5b29c7b6c73fba6450c
MD5 a0f14ad192d7ed233c794b273ca8dc09
BLAKE2b-256 317b91082ed14a3143b02f8e185089ce278cd35405f1c813fe275903baef161d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page