Skip to main content

A job queue with data dependencies

Project description

Build Status

parallex

System Requirements

Python >= 3.8

install

Default object store

pip install tx-parallex

Plasma store https://arrow.apache.org/

pip install tx-parallex[arrow]

Install from source

  1. Clone the repo
  2. Easy install instructions:
# Create a virtual environment called 'px'
conda create -n px python=3.8
# start-up the environment you just created
conda activate px
# install the rest of the tx-parallex pre-requirements
pip install -r requirements.txt
  1. Test
# run the tests, a number of test 'specs'
PYTHONPATH=src pytest -x -vv --full-trace -s --timeout 60
# deactivate the environment (if desired)
conda deactivate

set log level

set environment variable LOG_LEVEL to one of Python's logging library setLevel.

Introduction

A queue with dependencies

Usage

from tx.parallex import run_python

ret = run_python(number_of_workers = 4, pyf = "spec.py", dataf = "data.yml")

Spec

tx-parallex specs can be written in YAML or a Python-like DSL. The Python-like DSL is translated to YAML by tx-parallex. Each object in a spec specifies a task. When the task is executed, it is given a dict called data. The pipeline will return a dictionary.

YAML

Assuming you have a function sqr defined in module math which returns the square of its argument.

def sqr(x):
  return x * x

let

The let task sets data for its subtask. It adds a new var value pair into data within the scope of its subtask, and executes that task.

Syntax:

type: let
var: <var>
obj: <value>
sub: <subtask>

Example:

type: let
var: a
obj:
  data: 1
sub:
  type: python
  name: y
  mod: math
  func: sqr
  params: 
    x:
      name: a

map

The map task reads a list coll from data and applies a subtask to each member of the list. The members will be assigned to var in data passed to those tasks

Syntax:

type: map
coll: <value>
var: <variable name>
sub: <subtask>

<value> is an object of the form:

Reference an entry in data or the name of a task

"name": <variable name>

Constant

"data": <constant>

Example:

type: map
coll: 
  data:
  - 1
  - 2
  - 3
var: a
sub:
  type: python
  name: y
  mod: math
  func: sqr
  params: 
    x:
      name: a

cond

The cond task reads a boolean value and if it is true then it executes the then task otherwise it executes the else task.

Syntax:

type: cond
on: <value>
then: <subtask>
else: <subtask>

Example:

type: cond
on: 
  data:
    true
then:
  type: ret
  obj:
    data: 1
else:
  type: ret
  obj:
    data: 0

python

You can use any Python module.

The python task runs a Python function. It reads parameters from data. The return value must be pickleable.

Syntax:

type: python
name: <name>
mod: <module>
func: <function>
params: <parameters>

<parameters> is an object of the form:

<param> : <value>
...
<param> : <value>

where <param> can be either name or position.

Example:

  type: python
  name: y
  mod: math
  func: sqr
  params: 
    x:
      data: 1

top

The top task toplogically sorts subtasks based on their dependencies and ensure the tasks are executed in parallel in the order compatible with those dependencies.

Syntax:

type: top
sub: <subtasks>

It reads the name properties of subtasks that are not in data.

Example:

type: top
sub:
- type: python
  name: y
  mod: math
  func: sqr
  params: 
    x:
      data: 1
- type: python
  name: z
  mod: math
  func: sqr
  params: 
    x:
      name: y

seq

The seq task forces all subtasks to be run sequentially.

Syntax:

type: top
sub: <subtasks>

It reads the name properties of subtasks that are not in data.

Example:

type: seq
sub:
- type: python
  name: y
  mod: math
  func: sqr
  params: 
    x:
      data: 1
- type: python
  name: z
  mod: math
  func: sqr
  params: 
    x:
      name: y

ret

ret specify a value. The pipeline will return a dictionary. When a task appears under a map task, it is prefix with the index of the element in that collection as following

<index>

For nested maps, the indices will be chained together as followings

<index>. ... .<index>

Syntax:

type: ret
obj: <value>

Example:

type: ret
obj: 
    name: z

Python

A dsl block contains a subset of Python.

  • There is a semantic difference from python. Any assignment in block is not visiable outside of the block.
  • Assignment within a block are unordered
  • return statement

Available syntax:

import

from <module> import *
from <module> import <func>, ..., <func>

import names from module

<module> absolute module names

assignment

<var> = <const>

where

<const> = <integer> | <number> | <boolean> | <string> | <list> | <dict>

This translates to let.

Example:

a = 1
y = sqr(x=a)
yield y

function application

<var> = [<module>.]<func>(<param>=<expr>, ...) | <expr>

This translate to python. where <var> is name <expr> is

<expr> = <expr> if <expr> else <expr> | <expr> <binop> <expr> | <expr> <boolop> <expr> | <expr> <compare> <expr> | <unaryop> <expr> | <var> | <const>

<binop>, <boolop> and <compare> and <unaryop> are python BinOp, BoolOp, Compare, and UnaryOp. <expr> is translated to a set of assignments, name, or data depending on its content.

Example:

y = math.sqr(1)
z = math.sqr(y)
return z

parallel for

for <var> in <expr>:
    ...

This translates to map.

Example:

for a in [1, 2, 3]:
  y = math.sqr(a)
  yield y

if

if <expr>:
    ...
else:
    ...

This translates to cond.

Example:

if z:
    yield 1
else:
    yield 0

The semantics of if is different from python, variables inside if is not visible outside

with

with Seq:
    ...

This translates to seq.

Example:

with Seq:
    y = math.sqr(1)
    return y

yield

yield <expr>

This translates to ret.

Example:

y = math.sqr(1)
return y

Data

data can be arbitrary yaml

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tx-parallex-0.1.11.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tx_parallex-0.1.11-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file tx-parallex-0.1.11.tar.gz.

File metadata

  • Download URL: tx-parallex-0.1.11.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tx-parallex-0.1.11.tar.gz
Algorithm Hash digest
SHA256 b5aea7b3012285a5bc1c307aeb94a5c85ad781f1c432d19a0dcc73242402a7d2
MD5 b4bdbb622333d491052c5f12ccb302d8
BLAKE2b-256 cf726c2cb7b4f74ffd3e485e3f7cd49f8302bd1e5f09fcf44c72a383bb74140f

See more details on using hashes here.

File details

Details for the file tx_parallex-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: tx_parallex-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tx_parallex-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 0eaf76f959812910aca72f7e3eab66b18b4bc49c04bd04b24caa57d54a8f0e7c
MD5 4d6502df4207030594c396f91d0faad8
BLAKE2b-256 70a39a09a40965f4516115f8ff112e822747389c7b5e13be0fd4c50fcf9a2378

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page