ultra simple command line tool for docker-scaling batch processing
Project description
quick_batch
quick_batch
is an ultra-simple command-line tool for large batch python-driven processing and transformation. It was designed to be fast to deploy, transparent, and portable. This allows you to scale any processor
function that needs to be run over a large set of input data, enabling batch/parallel processing of the input with minimal setup and teardown.
Getting started
All you need to scale batch transformations with quick_batch
is a
- transformation function(s) in a
processor.py
file Dockerfile
containing a container build appropriate to y our processor- an optional
requirements.txt
file containing required python modules- custom
requirements.txt
requireflask
,requests
, andpyyaml
- custom
Document paths to these objects as well as other parameters in a config.yaml
config file of the form below
data:
input_path: /path/to/your/input/data
output_path: /path/to/your/output/data
log_path: /path/to/your/log/file
queue:
feed_rate: <int - number of examples processed per processor instance>
order_files: <boolean - whether or not to order input files by size>
processor:
image_name: <pre-built-image-name>
dockerfile_path: /path/to/your/Dockerfile
requirements_path: /path/to/your/requirements.txt
processor_path: /path/to/your/processor/processor.py
num_processors: <int - instances of processor to run in parallel>
quick_batch
will point your processor.py
at the input_path
defined in this config.yaml
and process the files listed in it in parallel at a scale given by your choice of num_processors
.
Output will be written to the output_path
specified in the configuration file.
You can see tests/config_files
for examples of valid configs.
Usage
With your config.yaml
defined you can use quick_batch
at the terminal by typing
quick_batch /path/to/your/config.yaml
Installation
To install quick_batch, simply use pip
:
pip install quick-batch
The processor.py
file
Create a processor.py
file with the following basic pattern:
import ...
def processor(todos):
for file_name in todos.file_paths_to_process:
# processing code
The todos
object will carry in feed_rate
number of file names to process in .file_paths_to_process
.
Note: the function name processor
is mandatory.
Why use quick_batch
quick_batch aims to be
-
dead simple to use: versus standard cloud service batch transformation services that require significant configuration / service understanding
-
ultra fast setup: versus setup of heavier orchestration tools like
airflow
ormlflow
, which may be a hinderance due to time / familiarity / organisational constraints -
100% portable: - use quick_batch on any machine, anywhere
-
processor-invariant: quick_batch works with arbitrary processes, not just machine learning or deep learning tasks.
-
transparent and open source: quick_batch uses Docker under the hood and only abstracts away the not-so-fun stuff - including instantiation, scaling, and teardown. you can still monitor your processing using familiar Docker command-line arguments (like
docker service ls
,docker service logs
, etc.).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for quick_batch-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7500bb06942960cd5ce610b781fa96edda68e66b65dedbb4337e99b0234fcdfa |
|
MD5 | 4a2329d42848b4624166283ccc34897a |
|
BLAKE2b-256 | 7b2e5a52bf65d9a79370eca4d7944e611f49f81cc7b8d3a74c9b7d67f2c040db |