Skip to main content

ultra simple command line tool for docker-scaling batch processing

Project description

Python application Upload Python Package

quick_batch

quick_batch is an ultra-simple command-line tool for large batch processing and transformation. It allows you to scale any processor function that needs to be run over a large set of input data, enabling batch/parallel processing of the input with minimal setup and teardown.

Why use quick_batch

quick_batch aims to be

  • dead simple to use: versus standard cloud service batch transformation services that require significant configuration / service understanding

  • ultra fast setup: versus setup of heavier orchestration tools like airflow or mlflow, which may be a hinderance due to time / familiarity / organisational constraints

  • 100% portable: - use quick_batch on any machine, anywhere

  • processor-invariant: quick_batch works with arbitrary processes, not just machine learning or deep learning tasks.

  • transparent and open source: quick_batch uses Docker under the hood and only abstracts away the not-so-fun stuff - including instantiation, scaling, and teardown. you can still monitor your processing using familiar Docker command-line arguments (like docker service ls, docker service logs, etc.).

Installation

To install quick_batch, simply use pip:

pip install quick-batch

Usage

To use quick_batch, you need to define a processor.py file and a config.yaml file containing the necessary paths and parameters.

processor.py

Create a processor.py file with the following pattern:

import ...

def processor(todos):
    # Processor code

quick_batch will essentially point your processor.py at the input_path defined in your config.yaml and process this input in parallel at a scale given by your choice of num_processors. Output will be written to the output_path specified in the configuration file.

config.yaml

Create a config.yaml file with the following structure:

data:
  input_path: /path/to/your/input/data
  output_path: /path/to/your/output/data
  log_path: /path/to/your/log/file

queue:
  feed_rate: <int - number of examples processed per processor instance>
  order_files: <boolean - whether or not to order input files by size>

processor:
  processor_path: /path/to/your/processor/processor.py
  num_processors: <int - instances of processor to run in parallel>

Running quick_batch

To run quick_batch, execute the following command in your terminal:

quick_batch /path/to/your/config.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quick_batch-0.1.4.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quick_batch-0.1.4-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file quick_batch-0.1.4.tar.gz.

File metadata

  • Download URL: quick_batch-0.1.4.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for quick_batch-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0baade60a5b9959a9c974c8a8a9e0b0445865644c6f710481a7f8e2e929f5440
MD5 13d8ba51056c17beec31a65aecc6f399
BLAKE2b-256 ffdda2a250d7a71e681244246812ff68a313d3132ea7aa94e337a771fce1e7e6

See more details on using hashes here.

File details

Details for the file quick_batch-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: quick_batch-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for quick_batch-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7e8b741eb1ecd8da2952829e21ace555c2b3a016ff00961871c5000d2673e7bf
MD5 32b9adae2e52d7869a1f5d90acb011b7
BLAKE2b-256 7786b02e7127e4ca10a0920181425cf55188583c08b3f824485a7f4b82480b77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page