Distributed computing for everyone in modern Python.
Project description
achilles
Distributed/parallel computing in modern Python based on the multiprocessing.Pool API (map, imap, imap_unordered).
What is it?
achilles is built using a server/node/controller architecture. The achilles_server, achilles_node and achilles_controller are designed to run cross-platform/cross-architecture and may be hosted on a single machine (for development) or deployed across heterogeneous resources.
achilles is comparable to excellent Python packages like pathos/pyina, Parallel Python and SCOOP. Similar in some ways, different in others:
- Designed for developers familiar with the
multiprocessingmodule in the standard library with simplicity and ease of use in mind. - In addition to the blocking
mapAPI that requires developers to wait for all computation to be finished before accessing results (common in such packages),imap/imap_unorderedallow developers to process results as they are returned to theachilles_controllerby theachilles_server. achillesallows for composable scalability and novel design patterns as:- Lists, lists of lists and generator functions are accepted as arguments.
- TIP: Use generator functions together with
imaporimap_unorderedto perform distributed computation on arbitrarily large data.
- TIP: Use generator functions together with
- The
dillserializer is used to transfer data between the server/node/controller andmultiprocess(fork ofmultiprocessingthat uses thedillserializer instead ofpickle) is used to performPool.mapon theachilles_nodes, so developers are freed from some of the constraints of thepickleserializer.
- Lists, lists of lists and generator functions are accepted as arguments.
Install
pip install achilles
Quick Start
Start an achilles_server listening for connections from achilles_nodes at a certain endpoint specified as arguments or in an .env file in the achilles package's directory.
Then simply import map, imap, and/or imap_unordered from achilles_main and use them dynamically in your own code.
map, imap and imap_unordered will distribute your function to each achilles_node connected to the achilles_server. Then, the achilles_server will distribute arguments to each achilles_node (load balanced if the arguments' type is not already a list) which will then perform your function on the arguments using multiprocess.Pool.map.
Each achilles_node finishes its work, returns the results to the achilles_server and waits to receive another argument from it. This process is repeated until all of the arguments have been exhausted.
-
runAchillesServer(host=None, port=None, username=None, secret_key=None)-> run on your local machine or on another machine connected to your networkin:from achilles.lineReceiver.achilles_server import runAchillesServer # host = IP address of the achilles_server # port = port to listen on for connections from achilles_nodes (must be an int) # username, secret_key used for authentication with achilles_controller runAchillesServer(host='127.0.0.1', port=9999, username='foo', secret_key='bar')
# OR generate an .env file with a default configuration so that # arguments are no longer required to runAchillesServer() # use genConfig() to overwrite from achilles.lineReceiver.achilles_server import runAchillesServer, genConfig genConfig(host='127.0.0.1', port=9999, username='foo', secret_key='bar') runAchillesServer()
out:ALERT: achilles_server initiated at 127.0.0.1:9999 Listening for connections...
-
runAchillesNode(host=None, port=None)-> run on your local machine or on another machine connected to your networkin:from achilles.lineReceiver.achilles_node import runAchillesNode # genConfig() is also available in achilles_node, but only expects host and port arguments runAchillesNode(host='127.0.0.1', port=9999)
out:GREETING: Welcome! There are currently 1 open connections. Connected to achilles_server running at 127.0.0.1:9999 CLIENT_ID: 0
-
Examples of how to use the 3 most commonly used
multiprocessing.Poolmethods inachilles:Note:
map,imapandimap_unorderedcurrently accept lists, lists of lists, and generator functions asachilles_args.Generator functions must yield an
args_counterwith eacharg(i.e.yield args_counter, arg-> seeexamples\square_numsdirectory for an example of how to use).Also note: if there isn't already a
.envconfiguration file in theachillespackage directory, must usegenConfig(host, port, username, secret_key)before using or includehost,port,usernameandsecret_keyas arguments when usingmap,imap,imap_unordered.-
map(achilles_function, achilles_args, achilles_callback=None, chunk_size=1, host=None, port=None, username=None, secret_key=None)
in:from achilles.lineReceiver.achilles_main import map def achilles_function(arg): return arg ** 2 def achilles_callback(result): result_data = result["RESULT"] for i in range(len(result_data)): result_data[i] = result_data[i] ** 2 return result if __name__ == "__main__": results = map(achilles_function, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], achilles_callback, chunk_size=1) print(results)
out:ALERT: Connection to achilles_server at 127.0.0.1:9999 and authentication successful. [[1, 16, 81, 256, 625, 1296, 2401, 4096], [6561, 10000]]
-
imap(achilles_function, achilles_args, achilles_callback=None, chunk_size=1, host=None, port=None, username=None, secret_key=None)
in:from achilles.lineReceiver.achilles_main import imap def achilles_function(arg): return arg ** 2 def achilles_callback(result): result_data = result["RESULT"] for i in range(len(result_data)): result_data[i] = result_data[i] ** 2 return result if __name__ == "__main__": for result in imap(achilles_function, [1,2,3,4,5,6,7,8,9,10], achilles_callback, chunk_size=1): print(result)
out:ALERT: Connection to achilles_server at 127.0.0.1:9999 and authentication successful. {'ARGS_COUNTER': 0, 'RESULT': [1, 16, 81, 256, 625, 1296, 2401, 4096]} {'ARGS_COUNTER': 8, 'RESULT': [6561, 10000]}
-
imap_unordered(achilles_function, achilles_args, achilles_callback=None, chunk_size=1, host=None, port=None, username=None, secret_key=None)
in:from achilles.lineReceiver.achilles_main import imap_unordered def achilles_function(arg): return arg ** 2 def achilles_callback(result): result_data = result["RESULT"] for i in range(len(result_data)): result_data[i] = result_data[i] ** 2 return result if __name__ == "__main__": for result in imap_unordered(achilles_function, [1,2,3,4,5,6,7,8,9,10], achilles_callback, chunk_size=1): print(result)
out:ALERT: Connection to achilles_server at 127.0.0.1:9999 and authentication successful. {'ARGS_COUNTER': 8, 'RESULT': [6561, 10000]} {'ARGS_COUNTER': 0, 'RESULT': [1, 16, 81, 256, 625, 1296, 2401, 4096]}
-
How achilles works
Under the hood
Twisted- An event-driven networking engine written in Python and MIT licensed.
dilldillextends Python’spicklemodule for serializing and de-serializing Python objects to the majority of the built-in Python types.
multiprocess- multiprocess is a fork of multiprocessing that uses
dillinstead ofpicklefor serialization.multiprocessingis a package for the Python language which supports the spawning of processes using the API of the standard library’s threading module.
- multiprocess is a fork of multiprocessing that uses
Examples
See the examples directory for tutorials on various use cases, including:
- Square numbers/run multiple jobs sequentially
- Word count (TO DO)
How to kill cluster
from achilles.lineReceiver.achilles_main import killCluster
# simply use the killCluster() command and verify your intent at the prompt
# killCluster() will search for an .env configuration file in the achilles package's directory
# if it does not exist, specify host, port, username and secret_key as arguments
killCluster()
Caveats/Things to know
achilles_nodes use all of the CPU cores available on the host machine to performmultiprocess.Pool.map(pool = multiprocess.Pool(multiprocess.cpu_count()).- The
achilles_serveris designed to handle one job at a time for now (TO DO: add fault tolerance and ability to handle multiple jobs). achilles_callback_errorhas yet to be implemented, so detailed information regarding errors can only be gleaned from the interpreter used to launch theachilles_server,achilles_nodeorachilles_controller. Deploying the server/node/controller on a single machine is recommended for development.- Generator functions used to define args must yield an
args_counteralong with thearg(i.e.yield args_counter, arg-> seeexamples\square_nums) in the current version ofachilles. achillesperforms load balancing at runtime and assignsachilles_nodes arguments bycpu_count*chunk_size.- Default
chunk_sizeis 1. - Increasing the
chunk_sizeis an easy way to speed up computation and reduce the amount of time spent transferring data between the server/node/controller.
- Default
- If your arguments are already lists, the
chunk_sizeargument is not used.- Instead, one argument/list will be distributed to the connected
achilles_nodes at a time.
- Instead, one argument/list will be distributed to the connected
- If your arguments are load balanced, the results returned are contained in lists of length
achilles_node's cpu_count*chunk_size.map:- Final result of
mapis an ordered list of load balanced lists (the final result is not flattened).
- Final result of
imap:- Results are returned as computation is finished in dictionaries that include the following keys:
RESULT: load balanced list of results.ARGS_COUNTER: index of first argument (0-indexed).
- Results are ordered.
- The first result will correspond to the next result after the last result in the preceeding results packet's list of results.
- Likely to be slower than
immap_unordereddue toachilles_controlleryielding ordered results.imap_unordered(see below) yields results as they are received, whileimapyields results as they are received only if the argument'sARGS_COUNTERis expected based on the length of theRESULTlist in the preceeding results packet. Otherwise, the results packet is added to aresult_bufferand theresult_bufferis checked for the results packet with the expectedARGS_COUNTER. If it is not found,achilles_controllerwill not yield results until a results packet with the expectedARGS_COUNTERis received.
- Results are returned as computation is finished in dictionaries that include the following keys:
imap_unordered:- Results are returned as computation is finished in dictionaries that include the following keys:
RESULT: load balanced list of results.ARGS_COUNTER: index of first argument (0-indexed).
- Results are not ordered.
- Results packets are yielded as they are received (after any
achilles_callbackhas been performed on it). - Fastest way of consuming results received from the
achilles_server.
- Results packets are yielded as they are received (after any
- Results are returned as computation is finished in dictionaries that include the following keys:
achilles is in the early stages of active development and your suggestions/contributions are kindly welcomed.
achilles is written and maintained by Alejandro Peña. Email me at adpena at gmail dot com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file achilles-0.0.180.tar.gz.
File metadata
- Download URL: achilles-0.0.180.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40cdfd6ee91c11393464cc4e568c0450d80eed6e79af40500a05d996b086b0e2
|
|
| MD5 |
a8b4dd7a9bce3c6130edc40ac04c8452
|
|
| BLAKE2b-256 |
383c5968121df89b15286ebbbb077039a3cd38005918d8640726e7584e659a58
|
File details
Details for the file achilles-0.0.180-py3-none-any.whl.
File metadata
- Download URL: achilles-0.0.180-py3-none-any.whl
- Upload date:
- Size: 33.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92f2488db762f47c723a9cd26a21abbee546ac9136213b4c5adef175838a36c0
|
|
| MD5 |
668f10d3ce992aba9803af7ec779113f
|
|
| BLAKE2b-256 |
549e06f691aea5f02dfdb9f3ef5c131d1f38d00953172b944b3e390d5d451907
|