Abstractions for AI tasks following the Open Inference Protocol v2, with a HuggingFace Transformers reference implementation
Project description
Task Inference
A Python library that provides task-oriented abstractions for AI inference, bridging HuggingFace Transformers pipeline tasks with the Open Inference Protocol v2 (OIP v2) tensor format.
Key Design
Each task is modelled in three layers:
| Layer | Role |
|---|---|
Protocol (protocol/v2.py) |
Pydantic models for OIP v2 InferenceRequest / InferenceResponse |
Task (tasks/) |
Domain-specific input/output Pydantic schemas; process left abstract |
Implementation (implementations/transformers/) |
HuggingFace Transformers reference backend that fulfils process |
This separation allows swapping backends (ONNX, TensorRT, REST endpoint, …) without changing the task schema or OIP conversion logic.
Input / Output schemas
Every task exposes a pair of Pydantic models (e.g. ImageClassificationInput / ImageClassificationOutput). These models own the conversion to and from OIP v2 tensors:
| Method | Direction |
|---|---|
XxxInput.to_inference_request() |
Python input → InferenceRequest |
XxxInput.from_inference_request(request) |
InferenceRequest → Python input |
XxxOutput.to_inference_response(model_name) |
Python output → InferenceResponse |
XxxOutput.from_inference_response(response) |
InferenceResponse → Python output |
The process(inputs: XxxInput) -> XxxOutput method on task classes works entirely with these domain objects — no raw OIP tensors required.
Supported Tasks
Vision
| Task | Class | Default Model |
|---|---|---|
| Image Classification | TransformersImageClassificationTask |
google/vit-base-patch16-224 |
| Image Segmentation | TransformersImageSegmentationTask |
facebook/mask2former-swin-large-coco-panoptic |
| Object Detection | TransformersObjectDetectionTask |
facebook/detr-resnet-50 |
| Depth Estimation | TransformersDepthEstimationTask |
Intel/dpt-large |
| Mask Generation | TransformersMaskGenerationTask |
facebook/sam-vit-base |
| Visual Question Answering | TransformersVQATask |
dandelin/vilt-b32-finetuned-vqa |
| Image Anonymization | TransformersImageAnonymizationTask |
hustvl/yolos-tiny |
Audio
| Task | Class | Default Model |
|---|---|---|
| Automatic Speech Recognition | TransformersASRTask |
openai/whisper-base |
| Audio Classification | TransformersAudioClassificationTask |
superb/wav2vec2-base-superb-ks |
Installation
# Core (schemas + OIP v2 protocol only)
pip install task-inference
# With HuggingFace Transformers backend
pip install "task-inference[transformers]"
# With audio support
pip install "task-inference[all]"
Quick Start
Via the factory (recommended)
from task_inference import create_task
with open("cat.jpg", "rb") as f:
image_bytes = f.read()
task = create_task(
backend="transformers",
task_name="image-classification",
model_name="google/vit-base-patch16-224",
model_params={"device": "cpu"},
)
from task_inference.tasks.vision.image_classification import (
ImageClassificationInput,
ImageClassificationOutput,
)
inp = ImageClassificationInput(image=image_bytes, top_k=3)
resp = task(inp.to_inference_request())
result = ImageClassificationOutput.from_inference_response(resp)
for r in result.results:
print(r.label, r.score)
model_params is forwarded directly to the backend constructor, so any
backend-specific keyword argument (e.g. device, chunk_length_s,
points_per_batch) can be passed here.
To discover what backends and task names are available:
from task_inference import supported_tasks
print(supported_tasks())
# {'transformers': ['audio-classification', 'automatic-speech-recognition', ...]}
Direct instantiation
from task_inference.implementations.transformers.vision import (
TransformersImageClassificationTask,
)
from task_inference.tasks.vision.image_classification import (
ImageClassificationInput,
ImageClassificationOutput,
)
task = TransformersImageClassificationTask(model_name="google/vit-base-patch16-224")
# Call via OIP v2 round-trip
inp = ImageClassificationInput(image=image_bytes, top_k=3)
resp = task(inp.to_inference_request())
result = ImageClassificationOutput.from_inference_response(resp)
for r in result.results:
print(r.label, r.score)
# Convenience wrapper - build input from keyword arguments, returns InferenceResponse
resp = task.run(image=image_bytes, top_k=3)
result = ImageClassificationOutput.from_inference_response(resp)
OIP v2 round-trip
The input/output models handle all serialisation, so you can integrate with any OIP v2-compatible server without touching the task implementation:
from task_inference.tasks.vision.image_classification import (
ImageClassificationInput,
ImageClassificationOutput,
)
# --- Client side ---
inputs = ImageClassificationInput(image=image_bytes, top_k=3)
request = inputs.to_inference_request() # → InferenceRequest (send over HTTP)
# --- Server side ---
response = task(request) # returns InferenceResponse directly
# --- Client side (parse response) ---
output = ImageClassificationOutput.from_inference_response(response)
for r in output.results:
print(r.label, r.score)
Factory reference
create_task(backend, task_name, model_name=None, model_params=None)
| Parameter | Type | Description |
|---|---|---|
backend |
str |
Backend name — currently "transformers" |
task_name |
str |
Task identifier (see table below) |
model_name |
str | None |
HuggingFace model id or local path. When None the backend's built-in default model is used. |
model_params |
dict | None |
Extra keyword arguments passed to the constructor (e.g. device, chunk_length_s) |
Raises ValueError for unknown backends or task names.
supported_tasks(backend=None)
Returns a dict[str, list[str]] mapping each backend to its supported task names. Pass a backend name to filter to a single backend.
Task name reference
| Task name | Input class | Output class |
|---|---|---|
image-classification |
ImageClassificationInput |
ImageClassificationOutput |
object-detection |
ObjectDetectionInput |
ObjectDetectionOutput |
depth-estimation |
DepthEstimationInput |
DepthEstimationOutput |
image-segmentation |
ImageSegmentationInput |
ImageSegmentationOutput |
image-anonymization |
ImageAnonymizationInput |
ImageAnonymizationOutput |
mask-generation |
MaskGenerationInput |
MaskGenerationOutput |
visual-question-answering |
VQAInput |
VQAOutput |
image-text-to-text |
ImageTextToTextInput |
ImageTextToTextOutput |
zero-shot-image-classification |
ZeroShotImageClassificationInput |
ZeroShotImageClassificationOutput |
zero-shot-object-detection |
ZeroShotObjectDetectionInput |
ZeroShotObjectDetectionOutput |
audio-classification |
AudioClassificationInput |
AudioClassificationOutput |
automatic-speech-recognition |
ASRInput |
ASROutput |
Project Structure
src/task_inference/
├── factory.py # create_task() / supported_tasks() entry points
├── protocol/ # OIP v2 Pydantic models
│ └── v2.py
├── tasks/ # Abstract task definitions + input/output schemas
│ ├── base.py
│ ├── vision/ # Image-based tasks
│ └── audio/ # Audio-based tasks
├── implementations/
│ ├── transformers/ # HuggingFace reference backend
│ │ ├── base.py # Shared image/audio helpers
│ │ ├── vision/
│ │ └── audio/
│ └── onnxruntime/ # ONNX Runtime backend
│ ├── base.py # Shared ORT helpers
│ ├── vision/
│ ├── audio/
│ └── adapters/ # Dialect adapters (auto-detected from model I/O)
└── utils.py # Image/audio encode-decode helpers
Extending
Implement a new backend by subclassing the relevant task and overriding process:
from task_inference.tasks.vision.image_classification import (
ImageClassificationInput,
ImageClassificationOutput,
ImageClassificationTask,
)
class MyOnnxImageClassificationTask(ImageClassificationTask):
def process(self, inputs: ImageClassificationInput) -> ImageClassificationOutput:
# your ONNX / TensorRT / remote-endpoint logic here
...
ONNX Runtime adapters
The built-in onnxruntime backend supports multiple model families through a dialect-adapter layer that auto-detects the correct tensor contract from the model's I/O tensor names at load time. See docs/onnx-adapters.md for:
- The full dialect reference (tensor signatures, detection rules) for all 14 built-in dialects
- Model acquisition instructions (
optimum-cli/torch.onnx.export) for each task - Step-by-step guide for adding a custom adapter dialect
Security Policy
The current release is the supported version. Security fixes are released together with all other fixes in each new release.
If you discover a security vulnerability in this project, please do not open a public issue.
Instead, report it privately by emailing us at digitalhub@fbk.eu. Include as much detail as possible to help us understand and address the issue quickly and responsibly.
Contributing
To report a bug or request a feature, please first check the existing issues to avoid duplicates. If none exist, open a new issue with a clear title and a detailed description, including any steps to reproduce if it's a bug.
To contribute code, start by forking the repository. Clone your fork locally and create a new branch for your changes. Make sure your commits follow the Conventional Commits v1.0 specification to keep history readable and consistent.
Once your changes are ready, push your branch to your fork and open a pull request against the main branch. Be sure to include a summary of what you changed and why. If your pull request addresses an issue, mention it in the description (e.g., “Closes #123”).
Please note that new contributors may be asked to sign a Contributor License Agreement (CLA) before their pull requests can be merged. This helps us ensure compliance with open source licensing standards.
We appreciate contributions and help in improving the project!
Authors
This project is developed and maintained by DSLab – Fondazione Bruno Kessler, with contributions from the open source community. A complete list of contributors is available in the project’s commit history and pull requests.
For questions or inquiries, please contact: digitalhub@fbk.eu
Copyright and license
Copyright © 2025 DSLab – Fondazione Bruno Kessler and individual contributors.
This project is licensed under the Apache License, Version 2.0. You may not use this file except in compliance with the License. Ownership of contributions remains with the original authors and is governed by the terms of the Apache 2.0 License, including the requirement to grant a license to the project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file task_inference-0.1.1b1.tar.gz.
File metadata
- Download URL: task_inference-0.1.1b1.tar.gz
- Upload date:
- Size: 7.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ef52cc011d87dc13cf86f01f72bece1828d7ed7965dd2eeb3c07dcd2edcb4f4
|
|
| MD5 |
83dcbfbe537d17874684fe9986ea8aa4
|
|
| BLAKE2b-256 |
edd1792254386366144fc5c1374670536537227e2add52a68f0b76215e9873d2
|
File details
Details for the file task_inference-0.1.1b1-py3-none-any.whl.
File metadata
- Download URL: task_inference-0.1.1b1-py3-none-any.whl
- Upload date:
- Size: 104.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
952840faa5b6e800ad61eee3e040d8b61c4e6ceab3efaeeb4ae4117c6ce0b1bd
|
|
| MD5 |
cd9656d31c934f4566efc767fe8653dd
|
|
| BLAKE2b-256 |
be9410f7e3f7930df2ca81dd3d33b98fab0f636e231975e30232ea357cf09441
|