ultra simple command line tool for docker-scaling batch processing

These details have not been verified by PyPI

Project links

Homepage

Project description

quick_batch

quick_batch is an ultra-simple command-line tool for large batch python-driven processing and transformation. It was designed to be fast to deploy, transparent, and portable. This allows you to scale any processor function that needs to be run over a large set of input data, enabling batch/parallel processing of the input with minimal setup and teardown.

quick_batch
Getting started
Why use quick_batch

Getting started

All you need to scale batch transformations with quick_batch is a

transformation function(s) in a processor.py file
Dockerfile containing a container build appropriate to y our processor
an optional requirements.txt file containing required python modules

Document paths to these objects as well as other parameters in a config.yaml config file of the form below.

Under processor you can either define a dockerfile_path to your Dockerfile or an image_name to a pre-built image to be pulled.

data:
  input_path: /path/to/your/input/data
  output_path: /path/to/your/output/data
  log_path: /path/to/your/log/file

queue:
  feed_rate: <int - number of examples processed per processor instance>
  order_files: <boolean - whether or not to order input files by size>

processor:
  dockerfile_path: /path/to/your/Dockerfile OR
  image_name: <image_name_to_pull>
  requirements_path: /path/to/your/requirements.txt
  processor_path: /path/to/your/processor/processor.py
  num_processors: <int - instances of processor to run in parallel>

quick_batch will point your processor.py at the input_path defined in this config.yaml and process the files listed in it in parallel at a scale given by your choice of num_processors.

Output will be written to the output_path specified in the configuration file.

You can see the examples directory for examples of valid configs, processors, requirements, and dockerfiles.

Usage

To start processing with your config.yaml use quick_batch's config command at the terminal by typing

quick_batch config /path/to/your/config.yaml

This will start the build and deploy process for processing your data as defined in your config.yaml.

Scaling

Use the scale commoand to manually scale the number of processors / containers running your process

quick_batch scale <num_processors>

Here <num_processors> is an integer >= 1. For example, to scale to 3 parallel processors / containers: quick_batch scale 3

Installation

To install quick_batch, simply use pip:

pip install quick-batch

The `processor.py` file

Create a processor.py file with the following basic pattern:

import ...

def processor(todos):
  for file_name in todos.file_paths_to_process:
    # processing code

The todos object will carry in feed_rate number of file names to process in .file_paths_to_process.

Note: the function name processor is mandatory.

Why use quick_batch

quick_batch aims to be

dead simple to use: versus standard cloud service batch transformation services that require significant configuration / service understanding
ultra fast setup: versus setup of heavier orchestration tools like airflow or mlflow, which may be a hinderance due to time / familiarity / organisational constraints
100% portable: - use quick_batch on any machine, anywhere
processor-invariant: quick_batch works with arbitrary processes, not just machine learning or deep learning tasks.
transparent and open source: quick_batch uses Docker under the hood and only abstracts away the not-so-fun stuff - including instantiation, scaling, and teardown. you can still monitor your processing using familiar Docker command-line arguments (like docker service ls, docker service logs, etc.).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.13

Jun 9, 2023

0.1.12

Jun 5, 2023

0.1.11

Jun 2, 2023

0.1.10

Jun 1, 2023

0.1.9

May 31, 2023

0.1.8

May 30, 2023

0.1.7

May 30, 2023

0.1.6

May 30, 2023

0.1.5

May 25, 2023

0.1.4

May 25, 2023

0.1.3

May 25, 2023

0.1.2

May 24, 2023

0.1.1

May 24, 2023

0.1.0

May 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quick_batch-0.1.13.tar.gz (17.2 kB view details)

Uploaded Jun 9, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quick_batch-0.1.13-py3-none-any.whl (25.8 kB view details)

Uploaded Jun 9, 2023 Python 3

File details

Details for the file quick_batch-0.1.13.tar.gz.

File metadata

Download URL: quick_batch-0.1.13.tar.gz
Upload date: Jun 9, 2023
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for quick_batch-0.1.13.tar.gz
Algorithm	Hash digest
SHA256	`f513b252003ca3a91925fff4b646af0b55bab3b95abaa86aa57482c28b56e44e`
MD5	`1fc4275c62a0ef2c433a59642aff6103`
BLAKE2b-256	`c15007b96a46278dea5d7c014e44325091525e90ba6916a76b62d3fbf61e2ba9`

See more details on using hashes here.

File details

Details for the file quick_batch-0.1.13-py3-none-any.whl.

File metadata

Download URL: quick_batch-0.1.13-py3-none-any.whl
Upload date: Jun 9, 2023
Size: 25.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for quick_batch-0.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8b700ca2e51b69ea46706fe04340718b2698f1f7e66592453ca194d41b25e13`
MD5	`4d4f770e8a484171fc77636d18cc39a0`
BLAKE2b-256	`8517f3218faa94e8fadb32debf8234ed0a56157ee4801528df39cfe2982d9e57`

See more details on using hashes here.

quick-batch 0.1.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

quick_batch

Getting started

Usage

Scaling

Installation

The `processor.py` file

Why use quick_batch

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

quick-batch 0.1.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

quick_batch

Getting started

Usage

Scaling

Installation

The processor.py file

Why use quick_batch

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `processor.py` file