Utilities for Azure document processing
Project description
azure-doc-processing
azure-doc-processing is a Python library designed to simplify and
standardize the use of common Azure services in document-processing
workflows.
It reduces repeated boilerplate code across projects and provides a
clean, consistent, DRY development experience.
The library offers convenient wrappers and utilities around Azure SDK components, focusing primarily on document processing.
Features
Supported Azure Services
| Service | Module |
|---|---|
| Azure Storage (Blob Storage) | azure_doc_processing.blob_storage |
| Azure Document Intelligence | azure_doc_processing.document_intelligence |
| Azure Key Vault | azure_doc_processing.keyvault |
| Azure OpenAI | azure_doc_processing.openai_service |
| Azure Storage (Table Storage) | azure_doc_processing.table_storage |
Additional Functionality
Standardized Logger
from azure_doc_processing.logger import Logger
logger = Logger(__name__)
Document Utilities
Generic helper functions for document-processing tasks are available under:
import azure_doc_processing.utils as utils
Installation
Once published to PyPI:
pip install azure-doc-processing
Or with Poetry:
poetry add azure-doc-processing
Quick Example
Using key-based authentication:
from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger
logger = Logger(__name__)
datalake = AzureDataLake(
os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
)
blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")
Using DefaultAzureCredential authentication:
from azure_doc_processing.blob_storage import AzureDataLake
from azure_doc_processing.logger import Logger
logger = Logger(__name__)
datalake = AzureDataLake(
os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
)
blob_files = datalake.list_blob_files(container="my-container",prefix="my-prefix/")
logger.info(f"Found {len(blob_files)} in container")
Development
This repository uses:
- pyenv for Python version management\
- Poetry 1.8.5 for dependency & environment management\
- poethepoet for task automation\
- pytest for testing
Setup
Install all dependencies:
poetry install
Development Dependencies
Add development-only packages with:
poetry add <package> --group dev
Available Tasks
Using poethepoet:
Command Description
poe test Run the full test suite
poe format Format code (autoflake, black, isort)
Before committing changes, run:
poe format
poe test
Publishing the Package
This project includes automated publishing scripts.
Prerequisites
Make sure you have:
-
An API token for TestPyPI and/or PyPI
-
A valid
~/.pypircfile, for example:[testpypi] username = __token__ password = pypi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx [pypi] username = __token__ password = pypi-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
Publish to TestPyPI (dry run)
Use this when testing new releases. This publishes the package under a temporary test name and performs an install + import smoke test.
poe publish-testpypi
Publish to PyPi (real release)
Use this when releasing an official version.
Make sure you’ve bumped the version in pyproject.toml.
poe publish-pypi
Publishing will fail if the version already exists on PyPI. Real releases should use unique, semantic version increments.
License
This project is licensed under the Apache License 2.0.
See the LICENSE and NOTICE files for details.
Contributing
Contributions are welcome!
Please open an issue or submit a pull request via GitHub.
Acknowledgements
Maintained by Verdel Digitaal Partner
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file azure_doc_processing-1.1.2.tar.gz.
File metadata
- Download URL: azure_doc_processing-1.1.2.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dc41fb7ef396fde181e485cd984ce9e26f5312a692bda62131ba785ad479c91
|
|
| MD5 |
82e2fdfbf748f41b6551404a5db16880
|
|
| BLAKE2b-256 |
2904ba49b21652a9f87c2fb60b5fcb9a33bc16291a1d69452b0f1e1270e9ab76
|
File details
Details for the file azure_doc_processing-1.1.2-py3-none-any.whl.
File metadata
- Download URL: azure_doc_processing-1.1.2-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0d00dd395980da98dc3bcd518743045ee064dfff2a1fabf1239a5fb6b4014cf
|
|
| MD5 |
5d3853acbc33cf712750c5242c04d241
|
|
| BLAKE2b-256 |
47e4ba2fc1bee8975a69d3695ac7dfa51bc75ba837be168dd15cd2902bc1f5fb
|