Skip to main content

A scheduler for task and ruleset based parallel computation. Useful for highly parallel applications.

Project description

task-ruleset

A Python Package that acts as a scheduler for task and ruleset based parallel computation. Useful for highly parallel applications.

This package was created to help speed up time consuming activities in large software projects, such as waiting for HTTP response or DB operations.
Most large software projects consist of chunks of code which are independent, but are still made to run in sequence.

Any algorithm can be divided into a set of steps, called Tasks. Some Tasks may need to be run in a particular order, and some may be run in any order.

We allow the users to define Tasks of several types, and a Rule for acting on each type of Task. These Rules may perform some computation and end their Task, and then may create one or more new Tasks of same or different types, or may not create any new Tasks and end the chain. All Tasks which exist at a given point in time must not depend on each other and will run in parallel.

Paralellism is achieved by creating a fixed number of sub-processes and scheduling the Tasks on those sub-processes.

Usage Example

For detailed examples on how this project can be used, please check the folder named Examples on this project's GitHub page.
It also contains full details on the example which will be discussed shortly.

For now, lets consider a simple use case. We wish to run a Google search for whole numbers from 1 to 64 and save the search result locally as a html page.
This represents a usecase where one may need to make several HTTP requests, which are indepentent of each other.

The code for a parallel implementation of such use cases utilizing this package is shown below:

import task_ruleset as trs
import __helper as h
import time; startTimeStamp = time.time()


# the list of numbers we wish to google
numsToPull = range(1, 64 + 1)


# The Generator which states how tasks of type 'init' need to be performed
# It yields when a new task is ready to start execution
def rule_init(TaskKey, TaskData):

    # create output folder
    outputPath = h.prepareOutputFolder()
    # record output path in a location accessible by all processes
    trs.CommonDict['OUTPUT_PATH'] = outputPath

    # for each number
    for num in numsToPull:
        
        # create new task of type 'proc', pass name of file to process as a param
        yield trs.Task("proc", f"proc_{num}", [num])

    # mark this task as completed
    return


# The Function which states how tasks of type 'proc' need to be performed
# It returns an empty list since it does not schedule more tasks
def rule_proc(TaskKey, TaskData):

    # get the number to google, from the params passed when creating the task
    num = TaskData[0]
    # get the output path recorded during execution of the initial task
    outputPath = trs.CommonDict['OUTPUT_PATH']
    
    # search for the number on google and get the search results
    processedData = h.googleSearchNumber(num)
    
    # save the search results to a file
    h.saveOutputFile(outputPath, num, processedData)

    # mark this task as completed
    return []


# Details about task organisation
trs.NGuests = 8                                 # State that the tasks need to be performed on 8 processes
trs.Rules["init"] = (0, rule_init)              # Declare that there is a task of type 'init' which needs 0 params
trs.Rules["proc"] = (1, rule_proc)              # Declare that there is a task of type 'proc' which needs 1 param
trs.InitTask = trs.Task("init", "init", [])     # State that initial task is of type 'init', and does not take any params

# Only the main thread should run this code
if __name__ == '__main__':
    
    # Start execution of the tasks, starting with trs.InitTask
    trs.main()
    
    # Record the number of tasks completed by each process
    print("main - Tasks completed by each process :", trs.TasksAssignedToProcess)

    # Record execution time
    print(f"Completed Execution in: {time.time() - startTimeStamp} secs")

The initial task is to create 64 new tasks, each of which is to perform a Google search.
These 64 tasks are independent of each other, and are scheduled in parallel on 8 processes.

While testing, we observed roughly 7x speed up when using this parallelized program in place of the single threaded implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

task-ruleset-0.0.4.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

task_ruleset-0.0.4-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file task-ruleset-0.0.4.tar.gz.

File metadata

  • Download URL: task-ruleset-0.0.4.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.6

File hashes

Hashes for task-ruleset-0.0.4.tar.gz
Algorithm Hash digest
SHA256 795026e3ec3723461f127db0e6c4fc295b563a1758bbb852561470ca971cb14f
MD5 a75627d65a04e863853d1ad337502ad4
BLAKE2b-256 28a5fc80872faee3711dec3867d030bd07b89683bf3c3e01d4f952753cd553ba

See more details on using hashes here.

File details

Details for the file task_ruleset-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for task_ruleset-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 98124dc6cb4d737bcdc24f133e7e2efd8d9eefb3df7eb91e74e0bf95bd71cb29
MD5 c1f9e79cbe8d6b2409d47d9f1758e0ba
BLAKE2b-256 b37abff03ca1289ed2480367a87ff62a86177e1bd19976b3ffe12dc67a2b0deb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page