Skip to main content

A parallel execution engine that doesn't know anything about serialization

Project description

License PyPI PyPI - Python Version Ruff ty

Multi-core parallelism in pure python

spacecadet is a library designed to enable parallel code execution where each task can use multiple cores. Additionally, instead of defining custom serialization methods, spacecadet leverages lazyscribe to track and manage artifacts.

Multithreaded execution

Variable substitution via lazyscribe

To use this functionality, please install the lazyscribe extra:

uv pip install spacecadet[lazyscribe]

Suppose you are building a model. Every experiment is tracked via lazyscribe

from lazyscribe import Project

project = Project("project.json", mode="w")
with project.log("my-experiment") as exp:
    exp.log_artifact(name="features", value=["a", "b", "c"], handler="json")
    ...

project.save()

Suppose you have the following function to report the outcome of this experiment:

import logging

LOG = logging.getLogger(__name__)

def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

We can use spacecadet to directly connect the artifact from our experiment to the function.

from spacecadet.threading import cadet

project = Project("project.json", mode="r")


@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

this decorator converts model_report to a lightly customized extension of threading.Thread. Let's execute this function:

thread = model_report("My model name", "features")
thread.start()
thread.join()

The logging will show that we have replaced the literal string "features" with the value ["a", "b", "c"] from the artifact with the name "features" in our experiment. If you're shipping more application-side code that needs to be flexible to the source lazyscribe experiment or repository, you can also define the source later:

# Application-side code

@cadet
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

# User-side code
thread = model_report("My model name", "features")
thread.options(project["my-experiment"])

Managing thread allocation

Additionally, you can use spacecadet to run functions in threads that require multiple resources themselves. Suppose you have a 12-core system. If each of your functions consumes 4 cores themselves, you want to limit the number of concurrent functions that are running to reduce thread contention. With spacecadet, we use a semaphore-like object to acquire and release multiple threads.

@cadet(num_threads=4)
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

thread = model_report("My model name", "features")
thread.options(source=project["my-experiment"])
thread.start()
thread.join()

By default, spacecadet will use os.cpu_count to detect the number of cores on your machine. This value will represent the total number of threads available. When spacecadet.threading.CadetThread.start is called, we will "acquire" 4 threads from the available pool. If you don't want to use os.cpu_count to determine the total number of threads, you have a few options.

  1. Specify the number of available threads through the environment variable SPACECADET_MAX_THREADS

    export SPACECADET_MAX_THREADS=12
    
  2. Use a context manager to temporarily set the number of threads:

    from spacecadet.semaphore import ThreadedSemaphore
    
    with ThreadedSemaphore(12):
        ...
    
    # outside of the context handler, os.cpu_count will be used again
    

NOTE: we have designed spacecadet.semaphore.ThreadedSemaphore as a singleton object. This means that, no matter how many times you instantiate the class, the parameters used in the first instantiation of the semaphore will be used until all references to that instance have been deleted:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> alloc = ThreadedSemaphore(1)
>>> print(alloc)
Allocator: 12 available threads

Deleting the instance will allow you to change the total available threads:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> del alloc
>>> alloc = ThreadedSemaphore(1)
Allocator: 1 available threads

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacecadet-0.2.1.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacecadet-0.2.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file spacecadet-0.2.1.tar.gz.

File metadata

  • Download URL: spacecadet-0.2.1.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.2.1.tar.gz
Algorithm Hash digest
SHA256 33675131b8be47197eb18b4f1b96a8d108afff8baec504e2ce67689c79972949
MD5 b7fce36c64637af1566d7beb3a00d996
BLAKE2b-256 1226f988b001d4e0506e15fced0b3bf1b8e2d104c50c369978a279fb624039e3

See more details on using hashes here.

File details

Details for the file spacecadet-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: spacecadet-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f6f90bd639df00a7fcc4c7176c1d4c8fa0deec8b39ef2d66761a103fbfba114
MD5 c948aae0b775b15be910f355ade8416b
BLAKE2b-256 61e159ec263f0edbad6560df391a7e6e7876505a8a2b438fa52c218227885367

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page