Skip to main content

Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of arbitrary work items against many endpoints. Extensible via dependency injection.

Project description

Introduction

Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of arbitrary work items against many endpoints. Extensible via dependency injection. Worker endpoints can be local, remote, containers, cloud APIs, different processes, or even just different listener sockets in the same process.

Includes examples against Azure Cognitive Service containers for ML eval workloads.

Consuming

The framework can be built on via template method pattern and dependency injection. One simply needs to provide concrete implementation for the following types:

WorkItemRequest: Encapsulates all the details needed by the WorkItemProcessor to process a work item.

WorkItemResult: Representation of the outcome of an attempt to process a WorkItemRequest.

WorkItemProcessor: Provides implementation on how to process a WorkItemRequest against an endpoint.

BatchRequest: Represents a batch of work items to do. Produces a collection of WorkItemRequests.

BatchConfig: Details needed for a BatchRequest to produce the collection of WorkItemRequests.

BatchRunSummarizer: Implements a near-real-time status updater based on WorkItemResults as the batch progresses.

EndpointStatusChecker: Specifies how to determine whether an endpoint is healthy and ready to take on work from a WorkItemProcessor.

The Speech Batch Kit is currently our prime example for consuming the framework.

The batchkit package is available as an ordinary pypi package. See versions here: https://pypi.org/project/batchkit

Dev Environment

This project is developed for and consumed in Linux environments. Consumers also use WSL2, and other POSIX platforms may be compatible but are untested. For development and deployment outside of a container, we recommend using a Python virtual environment to install the requirements.txt. The Speech Batch Kit example builds a container.

Tests

This project uses both unit tests run-tests and stress tests run-stress-tests for functional verification.

Building

There are currently 3 artifacts:

  • The pypi library of the batchkit framework as a library.

  • The pypi library of the batchkit-examples-speechsdk.

  • Docker container image for speech-batch-kit.

Examples

Speech Batch Kit

The Speech Batch Kit (batchkit_examples/speech_sdk) uses the framework to produce a tool that can be used for transcription of very large numbers of audio files against Azure Cognitive Service Speech containers or cloud endpoints.

For introduction, see the Azure Cognitive Services page.

For detailed information, see the Speech Batch Kit's README.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batchkit-0.10.4.tar.gz (45.1 kB view details)

Uploaded Source

Built Distribution

batchkit-0.10.4-py3-none-any.whl (87.6 kB view details)

Uploaded Python 3

File details

Details for the file batchkit-0.10.4.tar.gz.

File metadata

  • Download URL: batchkit-0.10.4.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for batchkit-0.10.4.tar.gz
Algorithm Hash digest
SHA256 b4c4a9f8e03b3de549d05b7458d1b12992c3b5a0a51b8a18867afc5a24303f94
MD5 15fa4656943203e11ed36e63f02a8e9e
BLAKE2b-256 badea8a74a5712c60f6d674758e6b1442df7f84debeb7206653667e82ce219f9

See more details on using hashes here.

File details

Details for the file batchkit-0.10.4-py3-none-any.whl.

File metadata

  • Download URL: batchkit-0.10.4-py3-none-any.whl
  • Upload date:
  • Size: 87.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for batchkit-0.10.4-py3-none-any.whl
Algorithm Hash digest
SHA256 32cb20b9a6e3a8a2f76b23ace648ee08f854f226c395e074c8dbf2c66c420fcd
MD5 d99a03e0aade4a5cc558cae8f3da9b37
BLAKE2b-256 2d124738e2cf6275e281bebee30ce15caea215ef38f4d1bc93efcc5ab5b10ed7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page