ultra simple command line tool for docker-scaling batch processing

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

quick_batch

quick_batch is an ultra-simple command-line tool for large batch processing and transformation. It allows you to scale any processor function that needs to be run over a large set of input data, enabling batch/parallel processing of the input with minimal setup and teardown.

quick_batch

Why use quick_batch

quick_batch aims to be

dead simple to use: versus standard cloud service batch transformation services that require significant configuration / service understanding
ultra fast setup: versus setup of heavier orchestration tools like airflow or mlflow, which may be a hinderance due to time / familiarity / organisational constraints
100% portable: - use quick_batch on any machine, anywhere
processor-invariant: quick_batch works with arbitrary processes, not just machine learning or deep learning tasks.
transparent and open source: quick_batch uses Docker under the hood and only abstracts away the not-so-fun stuff - including instantiation, scaling, and teardown. you can still monitor your processing using familiar Docker command-line arguments (like docker service ls, docker service logs, etc.).

Installation

To install quick_batch, simply use pip:

pip install quick-batch

Usage

To use quick_batch, you need to define a processor.py file and a config.yaml file containing the necessary paths and parameters.

`processor.py`

Create a processor.py file with the following pattern:

import ...

def processor(todos):
    # Processor code

quick_batch will essentially point your processor.py at the input_path defined in your config.yaml and process this input in parallel at a scale given by your choice of num_processors. Output will be written to the output_path specified in the configuration file.

`config.yaml`

Create a config.yaml file with the following structure:

data:
  input_path: /path/to/your/input/data
  output_path: /path/to/your/output/data
  log_path: /path/to/your/log/file

queue:
  feed_rate: <int - number of examples processed per processor instance>
  order_files: <boolean - whether or not to order input files by size>

processor:
  processor_path: /path/to/your/processor/processor.py
  num_processors: <int - instances of processor to run in parallel>

Running quick_batch

To run quick_batch, execute the following command in your terminal:

quick_batch /path/to/your/config.yaml

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.13

Jun 9, 2023

0.1.12

Jun 5, 2023

0.1.11

Jun 2, 2023

0.1.10

Jun 1, 2023

0.1.9

May 31, 2023

0.1.8

May 30, 2023

0.1.7

May 30, 2023

0.1.6

May 30, 2023

0.1.5

May 25, 2023

This version

0.1.4

May 25, 2023

0.1.3

May 25, 2023

0.1.2

May 24, 2023

0.1.1

May 24, 2023

0.1.0

May 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quick_batch-0.1.4.tar.gz (14.9 kB view hashes)

Uploaded May 25, 2023 Source

Built Distribution

quick_batch-0.1.4-py3-none-any.whl (23.4 kB view hashes)

Uploaded May 25, 2023 Python 3

Hashes for quick_batch-0.1.4.tar.gz

Hashes for quick_batch-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`0baade60a5b9959a9c974c8a8a9e0b0445865644c6f710481a7f8e2e929f5440`
MD5	`13d8ba51056c17beec31a65aecc6f399`
BLAKE2b-256	`ffdda2a250d7a71e681244246812ff68a313d3132ea7aa94e337a771fce1e7e6`

Hashes for quick_batch-0.1.4-py3-none-any.whl

Hashes for quick_batch-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e8b741eb1ecd8da2952829e21ace555c2b3a016ff00961871c5000d2673e7bf`
MD5	`32b9adae2e52d7869a1f5d90acb011b7`
BLAKE2b-256	`7786b02e7127e4ca10a0920181425cf55188583c08b3f824485a7f4b82480b77`