Skip to main content

Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of arbitrary work items against many endpoints. Extensible via dependency injection.

Project description

Introduction

Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of arbitrary work items against many endpoints. Extensible via dependency injection. Worker endpoints can be local, remote, containers, cloud APIs, different processes, or even just different listener sockets in the same process.

Includes examples against Azure Cognitive Service containers for ML eval workloads.

Consuming

The framework can be built on via template method pattern and dependency injection. One simply needs to provide concrete implementation for the following types:

WorkItemRequest: Encapsulates all the details needed by the WorkItemProcessor to process a work item.

WorkItemResult: Representation of the outcome of an attempt to process a WorkItemRequest.

WorkItemProcessor: Provides implementation on how to process a WorkItemRequest against an endpoint.

BatchRequest: Represents a batch of work items to do. Produces a collection of WorkItemRequests.

BatchConfig: Details needed for a BatchRequest to produce the collection of WorkItemRequests.

BatchRunSummarizer: Implements a near-real-time status updater based on WorkItemResults as the batch progresses.

EndpointStatusChecker: Specifies how to determine whether an endpoint is healthy and ready to take on work from a WorkItemProcessor.

The Speech Batch Kit is currently our prime example for consuming the framework.

The batchkit package is available as an ordinary pypi package. See versions here: https://pypi.org/project/batchkit

Dev Environment

This project is developed for and consumed in Linux environments. Consumers also use WSL2, and other POSIX platforms may be compatible but are untested. For development and deployment outside of a container, we recommend using a Python virtual environment to install the requirements.txt. The Speech Batch Kit example builds a container.

Tests

This project uses both unit tests run-tests and stress tests run-stress-tests for functional verification.

Building

There are currently 3 artifacts:

  • The pypi library of the batchkit framework as a library.

  • The pypi library of the batchkit-examples-speechsdk.

  • Docker container image for speech-batch-kit.

Examples

Speech Batch Kit

The Speech Batch Kit (batchkit_examples/speech_sdk) uses the framework to produce a tool that can be used for transcription of very large numbers of audio files against Azure Cognitive Service Speech containers or cloud endpoints.

For introduction, see the Azure Cognitive Services page.

For detailed information, see the Speech Batch Kit's README.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batchkit-0.9.14.tar.gz (43.7 kB view details)

Uploaded Source

Built Distribution

batchkit-0.9.14-py3-none-any.whl (50.7 kB view details)

Uploaded Python 3

File details

Details for the file batchkit-0.9.14.tar.gz.

File metadata

  • Download URL: batchkit-0.9.14.tar.gz
  • Upload date:
  • Size: 43.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.9

File hashes

Hashes for batchkit-0.9.14.tar.gz
Algorithm Hash digest
SHA256 45c01f88f5b2d7c8e99a2966609424283f66ac36b8caccd2768012a9cac62746
MD5 e6e76ec49fe3c287ed074af0ff273927
BLAKE2b-256 6991fcc0612ea3be57232510057f0d8cde37d679b98680856cae384e8cb191a7

See more details on using hashes here.

File details

Details for the file batchkit-0.9.14-py3-none-any.whl.

File metadata

  • Download URL: batchkit-0.9.14-py3-none-any.whl
  • Upload date:
  • Size: 50.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.9

File hashes

Hashes for batchkit-0.9.14-py3-none-any.whl
Algorithm Hash digest
SHA256 c64102bb043f86cd29431c279170df1ed7b050d5652e22edcca80ed7d0ffbb09
MD5 92db3a0e6ca06ae77e6e95609bf5ddc0
BLAKE2b-256 2c4bebd5aaea659705cbf85b76a7a3747a2c37f8b1c1137203911f021938ae72

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page