Skip to main content

Generic python module for handling dictionary-based batch data

Project description

Batch

Generic python module for handling dictionary-based batch data.


PyPI - Python Version PyPI Status

Purpose

Are you working with data of similar modalities, and often have to apply the same function to multiple elements? Are you using something similar to this:

batch = {
    "image_a": image_a,
    "image_b": image_b,
    "image_c": image_c
}

# Move to another device
for key in batch:
    batch[key] = batch[key].to(device)

# Transform
for key in batch:
    batch[key] = batch[key] * 2 + 1
    
# Combine
for key in batch:
    batch[key] = batch[key] + batch_2[key]

# Process
for key in batch:
    batch[key] = batch[key].max()

If the answer is yes, then this module is for you!

Our Batch package is a generic wrapper for dictionary-based batch data. It provides a simple way to apply the same function or operator to the whole batch. The module is completely device and container independent, you can use it with PyTorch, NumPy or any other libraries.

batch = Batch(
    image_a=image_a, 
    image_b=image_b, 
    image_c=image_c)

# Move to another device
batch = batch.to(device)

# Transform
batch = batch * 2 + 1

# Combine
batch = batch + batch_2

# Process
batch = batch.max()

Installation

pip install batch-dev

Usage

The example below demonstrates a few basic use-cases using NumPy. Similarly, PyTorch or other containers can also be used.

Import

from batch import Batch

Instantiation

Direct

# Create a batch directly
batch = Batch(
    image_a=np.random.rand(256, 256, 3), 
    image_b=np.random.rand(256, 256, 3), 
    image_c=np.random.rand(256, 256, 3))

From dictionary

batch = {
    "image_a": np.random.rand(256, 256, 3), 
    "image_b": np.random.rand(256, 256, 3), 
    "image_c": np.random.rand(256, 256, 3)}
# Create a batch from a dictionary
batch = Batch.from_dict(batch)

From tensor

image = np.random.rand(256, 256, 9)
    
# Create a batch from a tensor by splitting the tensor along one dimension and store the splits
data_splits = {
    "image_a": 1, 
    "image_b": 1, 
    "image_c": 1}
dim = 2
batch = Batch.from_tensor(batch, data_splits, dim=dim)

Indexing

A Batch is a string-keyed dictionary, with potentially mapping or iterable values.

String index

When a string index is given, then it is always interpreted as a key.

Single key

Querying a single key returns the value associated:

image_a = batch["image_a"]

You can even index deeper using . as separator:

batch_2 = Batch(input=batch)
image_a = batch_2["input.image_a"]

Multiple keys

Querying multiple keys (tuple or list) return a new batch with the selected keys:

batch_out = batch["image_a", "image_b"]

Wildcard query

Wildcard query is also supported and returns a new batch with the matching keys:

batch_out = batch.query_wildcard("image_*")

Integer index

When an integer index is given, then it is always interpreted as an index to the elements and returns a new batch with the indexed elements:

batch_out = batch[:,:,0]

Processing a batch

Operators

You can use the followingunary, binary and reverse operators:

# Unary operators
"__not__", "__abs__", "__index__", "__inv__", "__invert__", "__neg__", "__pos__",

# Binary operators
"__add__", "__and__", "__concat__", "__floordiv__", "__lshift__", "__mod__", "__mul__",
"__or__", "__pow__", "__rshift__", "__sub__", "__truediv__", "__xor__", "__eq__",

# Reverse operators
"__radd__", "__rand__", "__rmul__",
"__ror__", "__rsub__",  "__rxor__",

# In-place operators
"__iadd__", "__iand__", "__iconcat__", "__ifloordiv__", "__ilshift__", "__imod__", "__imul__",
"__ior__", "__ipow__", "__irshift__", "__isub__", "__itruediv__", "__ixor__"

Example:

# Use operators
batch_out = batch + batch_2 * 2

Member functions

You can use any member functions of the underlying container, for example:

# Use member functions
batch_out = batch.mean(axis=2)

Map

You can easily apply a function to the whole batch:

batch = batch.map(list)  # Converts all elements to list
batch = batch.map(np.stack, axis=0)  # Concatenates all elements to a single tensor

Map keys

You can also apply a function to the keys:

batch = batch.map_keys(lambda x: f"{x}_2")  # Add suffix

Limitations

A few limitations to consider when using this module:

  • Use only string keys for the batch.
  • Don't use keys starting with underscore (_).
  • Slice indexing is not implemented yet.
  • Generic iterable indexing is not implemented, only tuple and list.
  • Code documentation is in progress.
  • Some features are not yet documented here, please refer to the code directly.

If you have any ideas or requests, feel free to open an issue or a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batch_dev-0.0.4.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

batch_dev-0.0.4-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file batch_dev-0.0.4.tar.gz.

File metadata

  • Download URL: batch_dev-0.0.4.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for batch_dev-0.0.4.tar.gz
Algorithm Hash digest
SHA256 aedf5656d6a4ce5780492aab6a373978d64cc45cd703b903ae2876f96a90bc1b
MD5 bdeab61b049d7412095d6638cc9e82da
BLAKE2b-256 d8f58de2df18024c45cc1c5e1b2001b5cc81315b0f77a48c9bd3e4ff396605f0

See more details on using hashes here.

File details

Details for the file batch_dev-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: batch_dev-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for batch_dev-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8dfac5392051b4d48298b8c08d58c20222bf67e56016df526a8530b6f152ad9a
MD5 fb20ed844451e0706e7302cda7e7720f
BLAKE2b-256 bee3e7ba71eb60f7c33d2722109e7d93b5b0878ad2a2a63c5de732811a795bd6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page