Skip to main content

Job queue with DAG workflows, PostgreSQL backend, and choice of job executors.

Project description

PostQ = Cloud-Native Job Queue and DAG Workflow System

PostQ is a job queue system with

  • workflows that are directed acyclic graphs, with tasks that depend on other tasks
  • parallel task execution
  • shared files among tasks
  • a PostgreSQL database backend
  • choice of task executors: {shell, docker, [coming soon: kubernetes]}
  • easy on-ramp for developers: git pull https://github.com/kruxia/postq; cd postq; docker-compose up and you're running PostQ

Features

  • A PostQ Job Workflow is a DAG (Directed Acyclic Graph) of Tasks.

    Many existing job queue systems define jobs as single tasks, so it's up to the user to define more complex workflows. But many workflows (like CI/CD pipelines, and data applications) need to be able to define workflows at a higher level as a DAG of tasks, in which a given task might depend on earlier tasks that must first be completed, and which might be run in parallel with other tasks in the workflow.

    PostQ defines Job workflows as a DAG of tasks. For each named task, you list the other tasks that must be completed first, and PostQ will work out (using snazzy graph calculations) the simplest and most direct version of the workflow (i.e., the transitive reduction of the graph). It runs the tasks in the order indicated by the graph of their dependencies, and finishes when all tasks have been either completed or cancelled due to a preceding failure.

  • Workflow Tasks Are Executed in Parallel.

    When a PostQ Job is started, it begins by launching all the tasks that don't depend on other tasks. Then, as each task finishes, it launches all additional tasks for which the predecessors have been successfully completed.

    At any given time, there might be many tasks in a Job running at the same time on different processors. The more you break down your workflows into tasks that can happen in parallel, the more CPUs your tasks can utilize, and the more quickly your jobs can be completed, limited only by the available resources.

  • Tasks in a Job Workflow Can Share Files.

    For workflows that process large amounts of data that is stored in files, it's important to be able to share these files among all the tasks in a workflow. PostQ creates shared temporary file storage for each job, and each task is run with that directory as the current working directory.

    So, for example, you can start your workflow with a task that pulls files from permanent storage, then other tasks can process the data in those files, create other files, etc. Then, at the end of the work, the files that need to be saved as artifacts of the job can be pushed to permanent storage.

  • A PostgreSQL Database Is the (Default) Job Queue.

    PostgreSQL provides persistence and ACID transaction guarantees. It is the simplest way to ensure that a job is not lost, but is processed exactly once. PostgreSQL is also already running in many web and microservice application clusters, so building on Postgres enables developers to easily add a Job Queue to their application without substantially increasing the necessary complexity of their application. PostgreSQL combines excellent speed with fantastic reliability, durability, and transactional guarantees.

  • The Docker Executor Runs each Task in a Container Using any Image.

    Many existing task queue systems assume that the programming environment in which the queue worker is written is available for the execution of each task. For example, Celery tasks are written and run in python.

    Instead, PostQ has the ability to run tasks in separate containers. This enables a task to use any software, not just the software that is available in the queue worker system.

    (Author's Note: This was one of the primary motivations for writing PostQ. I am building an application that has workflows with tasks requiring NodeJS, or Java, or Python, or Chromium. It's possible to build an image that includes all of these requirements — and weighs in over a gigabyte! It's much more maintainable to separate the different task programs into different images, with each image including only the software it needs to complete its task.)

  • Easy On-ramp for Developers.

    git pull https://github.com/kruxia/postq.git
    cd postq
    docker-compose up
    

    The default docker-compose.yml cluster definition uses the docker executor (so tasks must define an image) with a maximum queue sleep time of 5 seconds and the default qname=''. Note that the default cluster doesn't expose any ports to the outside world, but you can for example shell into the running cluster (using a second terminal) and start pushing tasks into the queue. Or, the more common case is that your PostgreSQL instance is available inside your application cluster, so you can push jobs into postq directly from your application.

Usage Examples

Here is an example in Python using the running postq container itself. The Python stack is Databases, SQL Alchemy Core, and data models written in Pydantic:

$ docker-compose exec postq ipython
# (Using the ipython shell, which allows async/await without an explicit event loop.)
import os
import time
import asyncpg
from postq import models

queue = models.Queue(qname='playq')
database = await asyncpg.create_pool(dsn=os.getenv('DATABASE_URL'))
connection = await database.acquire()
job = models.Job(
    tasks={'a': {'command': 'echo Hey!', 'params': {'image': 'debian:bullseye-slim'}}}
)
job.update(
    **await database.fetchrow(
        *queue.put(job)
    )
)

# Then, wait a few seconds...
time.sleep(5)

joblog = models.Job(
    **await connection.fetchrow(
        *queue.get_log(id=job.id)
    )
)

print(joblog.tasks['a'].results)  # Hey!

Now you have a job log entry with the output of your command in the task results. :tada:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

postq-0.2.8.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

postq-0.2.8-py2.py3-none-any.whl (23.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file postq-0.2.8.tar.gz.

File metadata

  • Download URL: postq-0.2.8.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for postq-0.2.8.tar.gz
Algorithm Hash digest
SHA256 455eeaf19866039f2d4e72ceb3388c38630e344e5152e4065d85c89187d8620c
MD5 a0fcbd6a75cd1faccbecfab0c656a267
BLAKE2b-256 575e39520a3629305167ea9ec4d7dce4504ac50eaba00c98d9d116d881de86ea

See more details on using hashes here.

File details

Details for the file postq-0.2.8-py2.py3-none-any.whl.

File metadata

  • Download URL: postq-0.2.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for postq-0.2.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 460ad2e9380952dd041fadf2e9d9632dff3fcc118135e54ba0c075ff483be7a1
MD5 20332e0262bc7e72e18b94dd76ab43fa
BLAKE2b-256 41654eef07d46207e0d841fbdc6e34c744c80842a70dd1594222f16253b96620

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page