Python true-shared memory parallel computation
While the multi-thread parallel performance is limited in Python because of the Global Interpreter Lock, Pydsm provided parallel computation that truly shares memory between processes in a distributed way.
- POSIX (Linux, variants of Unix including macOS, etc.)
- Python 2 is supported, but Python 3 is recommended
You can install py-dsm via the
pip install py-dsm
To import pydsm in your program, say
foo.py, simply do
# foo.py import pydsm
Start the processes
To create a cluster instance of 4 worker processes, one will do:
with pydsm.Cluster(4) as cluster:
In pydsm, all global arrays in shared memory are numpy arrays.
To create a global array, one needs to first create a cluster instance
and then call
a = cluster.createShared(name = "A", shape = 10, dataType = int)
The user needs to provide the global array with an arbitrary but unique name,
the shape (i.e., dimension) of the array, and the datatype (default is int).
The shape and datatype arguments will follow the same format as those typical
numpy functions such as
numpy.zeros(). The returned array is an array with
all elements initialized to zeros. All names of the global arrays should be
Run the processes
runProcesses(func, paras) is invoked
to run the user's function in parallel.
func is the name of the
user-defined parallel function.
paras, defaulted to an empty tuple,
is a tuple consisting of parameters passed in to the user's function.
Each worker process will get a copy of those parameters in
foo implemented by the user will be executed
by 4 processes simultaneously.
The first argument of a user's parallel function is mendatory.
This mendatory parameter is a dictionary that contains resource
you may need for your parallel function.
The resource has references to the id of each process,
the global array(s), and a lock.
def foo(res, ...): # res is the resource # ... means any extra parameters you may need # Code
Locks and barriers
To use a lock in the parallelized function, the lock first needs to be created in the main process.
Then retrieve the lock from 'resource' (the first parameter) in the parallel function.
lock = res['foo_lock']
Then, the user can apply the lock anywhere they want in the function.
lock.apply() # Critical section lock.release()
The barrier is invoked in the following way.
Split the tasks
The user can invoke
splitIndices(n, id, random=True) inside the user's
parallel function to distribute the tasks.
splitIndices will return to each worker process a list of indices (numbers).
Say we want to compute Y = AX in parallel by splitting the rows of A into chunks. We have 100 rows in A and 10 processors. We can let each process take 10 rows.
myid = resource['id'] myidxs = pydsm.Cluster.splitIndices(100, myid, random=True) Y[myidxs, ] = np.matmul(A[myidxs,], X)
In this scenario,
n will be 100.
It is recommended to set
random to True to
split the rows randomly and thus achieve better load balancing.
Otherwise, process 0 will take the first ten rows, process 1 will take the
second ten rows, and so on, which may result in inefficient load balancing.
This is an embarrassingly parallel example that illustrates the usage
of the pydsm. Two vectors are added in parallel.
The idea is to break A and B into chunks, and have each worker process
work on one chunk.
The complete source code is in
A and B are the two vectors to be added. The result will be stored in C, as the computation is carried out. Thus, all of them need to be shared variables. They are set up in the following way.
with pydsm.Cluster(4) as cluster: A = cluster.createShared(name = "A", shape = 10, dataType = int) B = cluster.createShared("B", 10, int) C = cluster.createShared("C", 10, int)
Now A and B are shared arrays backed by numpy. We can use and treat them the same way we do to numpy arrays.
A[:] = np.arange(10) # A is now "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" B[:] = np.arange(10) B += 1 # B is now "array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
We then call
runProcesses(), with the first parameter being
the name of the parallel function, which we will define later.
The second parameter is a tuple of parameters used by parallel function.
Here we pass in 10 as the length of each vector.
Next, we define the parallel function
def add(resource, n): myid = resource['id'] A = pydsm.Cluster.getShared("A") B = pydsm.Cluster.getShared("B") C = pydsm.Cluster.getShared("C") myidxs = pydsm.Cluster.splitIndices(n, myid, random=False) C[myidxs] = A[myidxs] + B[myidxs]
Each worker process first obtains an unique id, and then
attaches A, B, and C to the corresponding shared variable.
splitIndices() is then called to give each worker a list of indices.
Each worker will then compute elements in the array corresponding to those
Note that C is a shared variable, so
add does not need to return C.
For illustrative purpose of the usage of barriers,
we can have the following code at the end of
The process 1 will print out C after the computation is done.
# Below is just for illustrative purpose pydsm.Cluster.barrier() if myid == 1: # In some versions of python, printing C directly may cause issues. # It is better to first convert the SharedArray into an numpy array # and then print it. So do np.array(C) before printing print("Check out vector C in processes: ", np.array(C)) # C will be '[ 1 3 5 7 9 11 13 15 17 19]'
For more sample applications using pydsm such as finding prime numbers and
matrix multiplication, they are under the directory
If your program is terminated abnormally, py-dsm may not compeletely delete your shared variables. When you run your program next time, you may encounter the following error message because your program is trying to create a shared variable that is still alive.
# You are trying to create a shared variable named 'A', but there is # already a shared var named 'A'. FileExistsError: [Errno 17] File exists: 'shm://A'
In this scenario, you need to delete the shared variable yourself in the Python interpreter.
>>> import SharedArray as sa >>> sa.delete("A")
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.