Distributed Numpy-like arrays in Python
Project description
Ramba is a Python project that provides a fast, distributed, NumPy-like array API using compiled Numba functions and a Ray or MPI-based distributed backend. It also provides a way to easily integrate Numba-compiled remote functions and remote Actor methods in Ray.
The main use case for Ramba is as a fast, drop-in replacement for NumPy. Although NumPy typically uses C libraries to implement array functions, it is still largely single threaded, and typically does not make use of multiple cores for most functions, and definitely cannot make use of multiple nodes in a cluster.
Ramba lets NumPy programs make use of multiple cores and multiple nodes with little to no code changes.
Example
Consider this simple example of a large computation in NumPy:
# test-numpy.py
import numpy as np
import time
t0 = time.time()
A = np.arange(1000*1000*1000)/1000.0
B = np.sin(A)
C = np.cos(A)
D = B*B + C**2
t1 = time.time()
print (t1-t0)
Let us try running this code on a dual-socket server with 36 cores/72 threads and 128GB of DRAM:
% python test-numpy.py
47.55583119392395
This takes over 47 seconds, but if we monitor resource usage, we will see that only a single core is used. All others remains idle.
We can very easily modify the code to use Ramba instead of NumPy:
# test-ramba.py
import ramba as np # Use ramba in place of numpy
import time
t0 = time.time()
A = np.arange(1000*1000*1000)/1000.0
B = np.sin(A)
C = np.cos(A)
D = B*B + C**2
np.sync() # Ensure any remote work is complete to get accurate times
t1 = time.time()
print (t1-t0)
Note that the only changes are the import line, and the addition of the np.sync()
. The latter is only needed to wait for
all remote work to complete, so we can get an accurate measure of execution time.
Now let us try running the ramba version:
% python test-ramba.py
3.860828161239624
The Ramba version saturates all of the cores, and results in about 12x speedup over the original numpy version. (Why only 12x? Three factors contribute to this: 1) this total includes some of the initialization time; 2) Time for JIT compile (~1 second here); 3) This code is memory-bandwidth bound, so after a point, additional cores will just end up waiting on memory). Importantly, this performance gain is achieved with no significant change to the code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file ramba-0.1.post157-py3-none-any.whl
.
File metadata
- Download URL: ramba-0.1.post157-py3-none-any.whl
- Upload date:
- Size: 122.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.3 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23d0928470575615e88d089d0ab6faf42d8f133911bc965b67b6fb994dbe1ec6 |
|
MD5 | 229b4aa4c918078d8d456d96796b9e22 |
|
BLAKE2b-256 | 17c32cc7fe737991dbfeb8039b7364b1c1fbbdf688371eb73b999ac105ebb9b8 |