Batch Inference

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Batch Inference Toolkit

Batch Inference Toolkit(batch-inference) is a Python package that batches model input tensors coming from multiple users dynamically, executes the model, un-batches output tensors and then returns them back to each user respectively. This will improve system throughput because of better compute parallelism and better cache locality. The entire process is transparent to developers.

When to use

When you want to host Deep Learning model inference on Cloud servers, especially on GPU

Why to use

It can improve your server throughput up to multiple times

Advantage of batch-inference

Platform independent lightweight python library
Only few lines code change is needed to onboard using built-in batching algorithms
Flexible APIs to support customized batching algorithms and input types
Support multi-process remote mode to avoid python GIL bottleneck
Tutorials and benchmarks on popular models:

Model	Throughput Comparing to Baseline	Links
Bert Embedding	4.7x	Tutorial
GPT Completion	16x	Tutorial

Installation

Install from Pip

python -m pip install batch-inference --upgrade

Build and Install from Source (for developers)

git clone https://github.com/microsoft/batch-inference.git
python -m pip install -e .[docs,testing]

# if you want to format the code before commit
pip install pre-commit
pre-commit install

# run unittests
python -m unittest discover tests

Example

Let's start with a toy model to learn the APIs. Firstly, you need to define a predict_batch method in your model class, and then add the batching decorator to your model class.

The batching decorator adds host() method to create ModelHost object. The predict method of ModelHost takes a single query as input, and it will merge multiple queries into a batch before calling predict_batch method. The predict method also splits outputs from predict_batch method before it returns result.

import numpy as np
from batch_inference import batching
from batch_inference.batcher.concat_batcher import ConcatBatcher

@batching(batcher=ConcatBatcher(), max_batch_size=32)
class MyModel:
    def __init__(self, k, n):
        self.weights = np.random.randn((k, n)).astype("f")

    # shape of x: [batch_size, m, k]
    def predict_batch(self, x):
        y = np.matmul(x, self.weights)
        return y

# initialize MyModel with k=3 and n=3
host = MyModel.host(3, 3)
host.start()

# shape of x: [1, 3, 3]
def process_request(x):
    y = host.predict(x)
    return y

Batcher is responsible to merge queries and split outputs. In this case ConcatBatcher will concat input tensors into a batched tensors at first dimension. We provide a set of built-in Batchers for common scenarios, and you can also implement your own Batcher. See What is Batcher for more information.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0

May 16, 2023

1.0rc1 pre-release

May 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

batch_inference-1.0-py3-none-any.whl (23.9 kB view details)

Uploaded May 16, 2023 Python 3

File details

Details for the file batch_inference-1.0-py3-none-any.whl.

File metadata

Download URL: batch_inference-1.0-py3-none-any.whl
Upload date: May 16, 2023
Size: 23.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for batch_inference-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d04164e65d6939521ecdd1d9a38089a856da0562eb31fea9b6af4b404dcd4878`
MD5	`3cc00ccbeb0080a7aa87ee0c82248bdc`
BLAKE2b-256	`c3c192e0956d4c4c21b871e63c1451a173ad6fcb48300abc0993a5b46ef1745b`

See more details on using hashes here.

batch-inference 1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Batch Inference Toolkit

When to use

Why to use

Advantage of batch-inference

Installation

Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes