Skip to main content

A parallel execution engine that doesn't know anything about serialization

Project description

License PyPI PyPI - Python Version Ruff ty

Multi-core parallelism in pure python

spacecadet is a library designed to enable parallel code execution where each task can use multiple cores. Additionally, instead of defining custom serialization methods, spacecadet leverages lazyscribe to track and manage artifacts.

Multithreaded execution

Variable substitution via lazyscribe

To use this functionality, please install the lazyscribe extra:

uv pip install spacecadet[lazyscribe]

Suppose you are building a model. Every experiment is tracked via lazyscribe

from lazyscribe import Project

project = Project("project.json", mode="w")
with project.log("my-experiment") as exp:
    exp.log_artifact(name="features", value=["a", "b", "c"], handler="json")
    ...

project.save()

Suppose you have the following function to report the outcome of this experiment:

import logging

LOG = logging.getLogger(__name__)

def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

We can use spacecadet to directly connect the artifact from our experiment to the function.

from spacecadet.threading import cadet

project = Project("project.json", mode="r")


@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

this decorator converts model_report to a lightly customized extension of threading.Thread. Let's execute this function:

thread = model_report("My model name", "features")
thread.start()
thread.join()

The logging will show that we have replaced the literal string "features" with the value ["a", "b", "c"] from the artifact with the name "features" in our experiment. If you're shipping more application-side code that needs to be flexible to the source lazyscribe experiment or repository, you can also define the source later:

# Application-side code

@cadet
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

# User-side code
thread = model_report("My model name", "features")
thread.options(project["my-experiment"])

Managing thread allocation

Additionally, you can use spacecadet to run functions in threads that require multiple resources themselves. Suppose you have a 12-core system. If each of your functions consumes 4 cores themselves, you want to limit the number of concurrent functions that are running to reduce thread contention. With spacecadet, we use a semaphore-like object to acquire and release multiple threads.

@cadet(num_threads=4)
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

thread = model_report("My model name", "features")
thread.options(source=project["my-experiment"])
thread.start()
thread.join()

By default, spacecadet will use os.cpu_count to detect the number of cores on your machine. This value will represent the total number of threads available. When spacecadet.threading.CadetThread.start is called, we will "acquire" 4 threads from the available pool. If you don't want to use os.cpu_count to determine the total number of threads, you have a few options.

  1. Specify the number of available threads through the environment variable SPACECADET_MAX_THREADS

    export SPACECADET_MAX_THREADS=12
    
  2. Use a context manager to temporarily set the number of threads:

    from spacecadet.semaphore import ThreadedSemaphore
    
    with ThreadedSemaphore(12):
        ...
    
    # outside of the context handler, os.cpu_count will be used again
    

NOTE: we have designed spacecadet.semaphore.ThreadedSemaphore as a singleton object. This means that, no matter how many times you instantiate the class, the parameters used in the first instantiation of the semaphore will be used until all references to that instance have been deleted:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> alloc = ThreadedSemaphore(1)
>>> print(alloc)
Allocator: 12 available threads

Deleting the instance will allow you to change the total available threads:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> del alloc
>>> alloc = ThreadedSemaphore(1)
Allocator: 1 available threads

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacecadet-0.3.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacecadet-0.3.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file spacecadet-0.3.0.tar.gz.

File metadata

  • Download URL: spacecadet-0.3.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e809c8ad651527e2c4e84a1702327f61255ff5fd31080cce441ced9dc1b8f088
MD5 6874657fb895d9034bf7df21d1060731
BLAKE2b-256 9c51ec6a1f622cc05e57dc5dc263c432f868616aa3d2eb1fb733bab995884404

See more details on using hashes here.

File details

Details for the file spacecadet-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: spacecadet-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fce24d66bf6c58e13465bccccd95325a9aa08b674d26c0ad037e175d8f16f0d8
MD5 7e7470eab136b4acc41c4cbd06046827
BLAKE2b-256 e289160e91145f4fb5ae776b2b9ec7df8aff637ccb57a3c0d092fe35a2993a26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page