ultra simple command line tool for docker-scaling batch processing
Project description
quick_batch
quick_batch is an ultra-simple command-line tool for large batch processing and transformation. It allows you to scale any processor
function that needs to be run over a large set of input data, enabling batch/parallel processing of the input with minimal setup and teardown.
Why use quick_batch
quick_batch aims to be
-
dead simple to use: versus standard cloud service batch transformation services that require significant configuration / service understanding
-
ultra fast setup: versus setup of heavier orchestration tools like
airflow
ormlflow
, which may be a hinderance due to time / familiarity / organisational constraints -
100% portable: - use quick_batch on any machine, anywhere
-
processor-invariant: quick_batch works with arbitrary processes, not just machine learning or deep learning tasks.
-
transparent and open source: quick_batch uses Docker under the hood and only abstracts away the not-so-fun stuff - including instantiation, scaling, and teardown. you can still monitor your processing using familiar Docker command-line arguments (like
docker service ls
,docker service logs
, etc.).
Installation
To install quick_batch, simply use pip
:
pip install quick-batch
Usage
To use quick_batch, you need to define a processor.py
file and a config.yaml
file containing the necessary paths and parameters.
processor.py
Create a processor.py
file with the following pattern:
import ...
def processor(todos):
# Processor code
quick_batch will essentially point your processor.py
at the input_path
defined in your config.yaml
and process this input in parallel at a scale given by your choice of num_processors
. Output will be written to the output_path
specified in the configuration file.
config.yaml
Create a config.yaml
file with the following structure:
data:
input_path: /path/to/your/input/data
output_path: /path/to/your/output/data
log_path: /path/to/your/log/file
queue:
feed_rate: <int - number of examples processed per processor instance>
order_files: <boolean - whether or not to order input files by size>
processor:
processor_path: /path/to/your/processor/processor.py
num_processors: <int - instances of processor to run in parallel>
Running quick_batch
To run quick_batch, execute the following command in your terminal:
quick_batch /path/to/your/config.yaml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for quick_batch-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e8b741eb1ecd8da2952829e21ace555c2b3a016ff00961871c5000d2673e7bf |
|
MD5 | 32b9adae2e52d7869a1f5d90acb011b7 |
|
BLAKE2b-256 | 7786b02e7127e4ca10a0920181425cf55188583c08b3f824485a7f4b82480b77 |