A parallel execution engine that doesn't know anything about serialization
Project description
Multi-core parallelism in pure python
spacecadet is a library designed to enable parallel code execution where each task
can use multiple cores. Additionally, instead of defining custom serialization methods,
spacecadet leverages lazyscribe to track and manage artifacts.
Multithreaded execution
Variable substitution via lazyscribe
To use this functionality, please install the lazyscribe extra:
uv pip install spacecadet[lazyscribe]
Suppose you are building a model. Every experiment is tracked via lazyscribe
from lazyscribe import Project
project = Project("project.json", mode="w")
with project.log("my-experiment") as exp:
exp.log_artifact(name="features", value=["a", "b", "c"], handler="json")
...
project.save()
Suppose you have the following function to report the outcome of this experiment:
import logging
LOG = logging.getLogger(__name__)
def model_report(model_name: str, features: list[str]):
"""Print the features used by the model.
Parameters
----------
model_name : str
The name of the model.
features : list[str]
A list of features used by the model.
"""
LOG.info(
"Model %s uses the following features: %s", model_name, ", ".join(features)
)
We can use spacecadet to directly connect the artifact from our experiment to the
function.
from spacecadet.threading import cadet
project = Project("project.json", mode="r")
@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
"""Print the features used by the model.
Parameters
----------
model_name : str
The name of the model.
features : list[str]
A list of features used by the model.
"""
LOG.info(
"Model %s uses the following features: %s", model_name, ", ".join(features)
)
this decorator converts model_report to a lightly customized extension of threading.Thread.
Let's execute this function:
thread = model_report("My model name", "features")
thread.start()
thread.join()
The logging will show that we have replaced the literal string "features" with the value
["a", "b", "c"] from the artifact with the name "features" in our experiment. If you're
shipping more application-side code that needs to be flexible to the source lazyscribe
experiment or repository, you can also define the source later:
# Application-side code
@cadet
def model_report(model_name: str, features: list[str]):
"""Print the features used by the model.
Parameters
----------
model_name : str
The name of the model.
features : list[str]
A list of features used by the model.
"""
LOG.info(
"Model %s uses the following features: %s", model_name, ", ".join(features)
)
# User-side code
thread = model_report("My model name", "features")
thread.options(project["my-experiment"])
Managing thread allocation
Additionally, you can use spacecadet to run functions in threads that require multiple resources
themselves. Suppose you have a 12-core system. If each of your functions consumes 4 cores themselves,
you want to limit the number of concurrent functions that are running to reduce thread contention.
With spacecadet, we use a semaphore-like object to acquire and release multiple threads.
@cadet(num_threads=4)
def model_report(model_name: str, features: list[str]):
"""Print the features used by the model.
Parameters
----------
model_name : str
The name of the model.
features : list[str]
A list of features used by the model.
"""
LOG.info(
"Model %s uses the following features: %s", model_name, ", ".join(features)
)
thread = model_report("My model name", "features")
thread.options(source=project["my-experiment"])
thread.start()
thread.join()
By default, spacecadet will use os.cpu_count to detect the number of cores on your machine.
This value will represent the total number of threads available. When
spacecadet.threading.CadetThread.start is called, we will "acquire" 4 threads from the available
pool. If you don't want to use os.cpu_count to determine the total number of threads,
you have a few options.
-
Specify the number of available threads through the environment variable
SPACECADET_MAX_THREADSexport SPACECADET_MAX_THREADS=12
-
Use a context manager to temporarily set the number of threads:
from spacecadet.semaphore import ThreadedSemaphore with ThreadedSemaphore(12): ... # outside of the context handler, os.cpu_count will be used again
NOTE: we have designed spacecadet.semaphore.ThreadedSemaphore as a singleton object.
This means that, no matter how many times you instantiate the class, the parameters used in the first instantiation of the
semaphore will be used until all references to that instance have been deleted:
>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> alloc = ThreadedSemaphore(1)
>>> print(alloc)
Allocator: 12 available threads
Deleting the instance will allow you to change the total available threads:
>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> del alloc
>>> alloc = ThreadedSemaphore(1)
Allocator: 1 available threads
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spacecadet-0.3.0.tar.gz.
File metadata
- Download URL: spacecadet-0.3.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e809c8ad651527e2c4e84a1702327f61255ff5fd31080cce441ced9dc1b8f088
|
|
| MD5 |
6874657fb895d9034bf7df21d1060731
|
|
| BLAKE2b-256 |
9c51ec6a1f622cc05e57dc5dc263c432f868616aa3d2eb1fb733bab995884404
|
File details
Details for the file spacecadet-0.3.0-py3-none-any.whl.
File metadata
- Download URL: spacecadet-0.3.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fce24d66bf6c58e13465bccccd95325a9aa08b674d26c0ad037e175d8f16f0d8
|
|
| MD5 |
7e7470eab136b4acc41c4cbd06046827
|
|
| BLAKE2b-256 |
e289160e91145f4fb5ae776b2b9ec7df8aff637ccb57a3c0d092fe35a2993a26
|