Skip to main content

OPoly: a simple OpenMP polyhedral compilator

Project description

OPoly

Polyhedral compilation library for automatic code parallelization in OpenMP written in python.

Dependencies

Python 3.9+ is required in order to use OPoly and to install it.

The python dependecies are included in the installation with pip and are listed here:

Minizinc

The only non-python dependency which you have to take care of is the Minizinc CLI tool that is used to model and solve the constraint programming problems of the polyhedral approach. The easiest way to install MiniZinc is to download the MiniZincIDE package, which contains both the MiniZinc binaries and several solvers. To install it, follow this guide. After installation, be sure that the executables are set in your PATH environment variable. One way to make sure about this is to try to execute the minizinc command from the console.

Installation

The recommended way to install OPoly is to install it with pip from the PyPI repository:

pip install opoly

You can also install OPoly from the source code available on GitHub also with pip:

pip install .

or the old way:

python setup.py install

Note: consider using a virtualenv or a conda environment to install OPoly into.

Usage

After installing OPoly, you will have access to the command line tool opoly. This script reads a file containing code that needs to be parallelized. The code must be given in a properly formatted, pseudocode language that will be described later.

The script will parse this file, detect the loop-carried dependencies and perform the polyhedral optimization using the method described by Lamport in his famous article The Parallel Execution of DO Loops.

The script will then generate the parallelized version of the code in the same pseudocode language or in C syntax, adding OpenMP directives where needed.

Suppose that the file containing the input code is named example1.psc, then the command required to generate the parallel code is:

opoly example1.psc

By default, opoly will output in the standard output the generated code in C syntax (with OMP directives). To output the code in pseudocode syntax, one can use the -f or --format option to specify the format of the resulting code to PSEUDO:

opoly example1.psc -f PSEUDO

The -o or --output argument can be used to specify a custom file in which the resulting code should be written, instead of the standard output:

opoly example1.psc -o omp-example1.c

For more information about the opoly command line tool, read the help with:

opoly -h

Description

The most common usage of the opoly tool is to read a file containing the nested loop(s) that needs to be parallelized. Let's say that the file contains this code:

FOR k FROM 1 TO q {
    FOR i FROM 1 TO n-2 {
        STM a[i] = (a[i-1] + a[i] + a[i+1]) / 3.0;
    }
}

Let's not care too much now about the syntax of the pseudocode as it will be described later, but to get a grasp of what this code is representing, we are going to describe it in a few words.

First, there are 2 nested loops: the outer loop iterates through variable k that assumes integer values from 1 to q; the inner loop iterates through variable i from 1 to n-2. There is a statement regarding vector a that assigns the element in position i to the sum of the elements in position i-1, i and i+1 divided by 3, at iteration i of the inner loop.

We can see that this code is not parallelizable just yet, because there are loop-carried dependecies regarding the computation of certain values of vector a. For example, we cannot compute a[i] and a[i-1] in parallel, since one value depends on the other.

opoly is able to rewrite the loops in a way that the second one can be parallelized without changing the result of the computation. The pseudocode version of the rewritten code is:

FOR new_k FROM 3 TO n + 2 * q - 2 STEP 1 {
    FOR CONC new_i FROM fmax(1, ceil((1.0 / 2.0) * (-n + new_k + 2))) TO fmin(q, floor((1.0 / 2.0) * (new_k - 1))) STEP 1 {
        VAR k = new_i;
        VAR i = -2 * new_i + new_k;
        STM a[i] = (a[i - 1] + a[i] + a[i + 1]) / 3.0;
    }
}

Two new index variables are introduced: new_k and new_i. These new indexes assume values in new ranges, which alltogether form the lattice of points inside the resulting polyhedron.

The values of the original variables are recomputed inside the innermost loop with the statements:

...
VAR k = new_i;
VAR i = -2 * new_i + new_k;
...

The statement that perform the computation remains unchanged.

There are a few functions in the loop range that compute the maximum (fmax), minimum (fmin), ceiling (ceil) and floor (floor) of their arguments.

opoly can also rewrite the code in C syntax and add OMP directives to perform parallel loops. The C version of the rewritten code is:

for(int new_k = 3; new_k <= n + 2 * q - 2; new_k++) {
    int new_i_lb = fmax(1, ceil((1.0 / 2.0) * (-n + new_k + 2)));
    int new_i_ub = fmin(q, floor((1.0 / 2.0) * (new_k - 1)));
    #pragma omp parallel for
    for(int new_i = new_i_lb; new_i <= new_i_ub; new_i++) {
        int k = new_i;
        int i = -2 * new_i + new_k;
        a[i] = (a[i - 1] + a[i] + a[i + 1]) / 3.0;
    }
}

Notice that the values of the lower and upper bounds of the inner loop are computed before the loop statement in this version. The result is a more readable code, not impacting performance.

Pseudocode syntax

We will now describe the syntax of the pseudocode.

We must first define the difference between indexes, parameters and vectors:

  • an index is a for loop index variable (eg. i or k in the examples above).
  • a parameter is every variable in the program for which the value is know at the time of execution and its value does not change during the execution (eg. q or n in the examples above).
  • a vector is a multi-dimensional variable that appears in assigments and it is used to store computation results (eg. a in the examples above).

We will discuss more about the restrictions each variable has in order to extract the dependecies and rewrite the loops later in this section.


The for loop statement syntax is:

FOR var_idx FROM lb_const TO ub_expr {

loop_body

}

where:

  • var_idx is the name of the variable representing the loop's index (eg. i, j or k).
  • lb_const is a positive constant integer (eg. 0, 1 or 42) representing the initial value of the index variable.
  • ub_expr is an expression representing the last value of the index variable. There are 3 possible kind of expression that can be written:
    • a positive constant integer
    • a parameter name
    • a two-term expression with a parameter name, an operator (+ or -) and a constant integer (eg. n-2).
  • loop_body are other statements (possibly for loops) contained in the loop body. If there are nested loops, the loops must be perfectly nested, which means that no other statements must be present in the loop body containing another for loop.

NOTE: each index variable name must be unique for every for loop.


The assigment statement syntax is:

STM left_term = right_term ;

where:

  • left_term is a vector expression, indexed by one or more index variables (eg. a[i]). A two-term expression can be present as index, but it must contain only one index variable, an operator (+ or -) and a constant integer (eg. a[i+2]).
  • right_term is a general expression containing vectors, parameters, indexes and constants.

NOTE: please note the ; at the end of the statement.

The vector expression syntax is:

vector_name[index_expr]...

where:

  • vector_name is the name of the vector (eg. a)
  • index_expr is an expression containing only one index variable (eg. i, i+1, i-2, etc...). There can be multiple indexes expression, one for each vector's dimension.

For example, a correct vector expression is a[i+1][j-1] and an incorrect vector expression is a[i+j][j] because i and j are both present in the first dimension index of the vector.

Also, vectors with the same name cannot be indexed with different index variables or in different order. For example, if x[i][j] is present, x[j][i] cannot be present as well as x[i].


A general expression is a list of other expression with an operator between each of them. For example, a + 1 is an expression of two terms a and 1 with an operator + in between them. In general, an expression can be described by the following syntax:

expr [op other_expr]...

where:

  • expr and other_expr are expressions
  • op is an operator. Supported operators are + (addition), - (subtraction), * (multiplication) and / (division).

There can be also round brackets () around an expression.

An example of a correct expression is:

a + v[i] * (1 - v[i-1]) / 3.0

Examples

Let's look at a simplified relaxation algorithm also used as an example in The Parallel Execution of DO Loops. The algorithm takes every point of an input matrix u of size M by N and compute its value as the mean of the adjacent points' value. This computation is done L times.

We can describe this algorithm with the following pseudocode:

FOR i FROM 1 TO L {
    FOR j FROM 2 TO M-1 {
        FOR k FROM 2 TO N-1 {
            STM u[j][k] = (u[j+1][k] + u[j][k+1] + u[j-1][k] + u[j][k-1]) * 0.25;
        }
    }
}

supposing that the matrix u is accessed with one-based index fashion.

If we let opoly rewrite this code, we will get this (in C syntax with OMP directives):

for(int new_i = 4; new_i <= 2 * L + M + N - 2; new_i++) {
    int new_j_lb = fmax(1, ceil((1.0 / 2.0) * (-M - N + new_i + 2)));
    int new_j_ub = fmin(L, floor((1.0 / 2.0) * (new_i - 2)));
    #pragma omp parallel for
    for(int new_j = new_j_lb; new_j <= new_j_ub; new_j++) {
        int new_k_lb = fmax(1, -M + new_i - 2 * new_j + 1);
        int new_k_ub = fmin(N - 1, new_i - 2 * new_j - 1);
        for(int new_k = new_k_lb; new_k <= new_k_ub; new_k++) {
            int i = new_j;
            int j = new_i - 2 * new_j - new_k;
            int k = new_k;
            u[j][k] = (u[j + 1][k] + u[j][k + 1] + u[j - 1][k] + u[j][k - 1]) * 0.25;
        }
    }
}


Let's look at another example. We have a computation involving three one-dimensional vectors a, b and c that needs to be done q times. Each vector is containing n values (zero-based indexing). We iterate through each i from 1 to n-2 and we compute the following three statements:

  • firstly, we compute the sum of b[i+1] and c[i] and assign it to a[i].
  • then, we compute the inverse of a[i] and assign it to a[i-1].
  • lastly, we update b[i] to the value of a[i]

There are a lot of dependecies in this code, but we don't have to worry since opoly will do the job of detecting them for us. We only need to write the pseudocode for this algorithm:

FOR k FROM 1 TO q {
    FOR i FROM 1 TO n-2 {
        STM a[i] = b[i+1] + c[i];
        STM a[i-1] = 1.0 / a[i];
        STM b[i] = a[i];
    }
}

And the resulting code will be:

for(int new_k = 3; new_k <= n + 2 * q - 2; new_k++) {
    int new_i_lb = fmax(1, ceil((1.0 / 2.0) * (-n + new_k + 2)));
    int new_i_ub = fmin(q, floor((1.0 / 2.0) * (new_k - 1)));
    #pragma omp parallel for
    for(int new_i = new_i_lb; new_i <= new_i_ub; new_i++) {
        int k = new_i;
        int i = -2 * new_i + new_k;
        a[i] = b[i + 1] + c[i];
        a[i - 1] = 1.0 / a[i];
        b[i] = a[i];
    }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opoly-0.1.3.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

opoly-0.1.3-py3-none-any.whl (56.5 kB view details)

Uploaded Python 3

File details

Details for the file opoly-0.1.3.tar.gz.

File metadata

  • Download URL: opoly-0.1.3.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.1

File hashes

Hashes for opoly-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d5ec4fec3792ad1476c264233e0bfe6f644a9513910eafa8a192ae6832d6fb38
MD5 6696612eac9cf51daef893dad6b61ddb
BLAKE2b-256 66517e5d6c1e8c43c47edf422af874a7f17210dd00da901a970b1bfafc0e33b4

See more details on using hashes here.

File details

Details for the file opoly-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: opoly-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 56.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.1

File hashes

Hashes for opoly-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f51d2f9149537b94132c18c95fc1931291351789f7e7d7908f2abf16fc4c3d4e
MD5 e1314575e112d7ac793355f8e34a54f8
BLAKE2b-256 8e6874af254a30be97357ee63494aee6ca06a2d6d167610340c1d80c911533f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page