Skip to main content

A parallel execution engine that doesn't know anything about serialization

Project description

License PyPI PyPI - Python Version Ruff ty

Parallel execution using lazyscribe

spacecadet is a library designed to enable parallel code execution without defining custom serialization. Instead, we leverage lazyscribe to track and manage experimentation as well as usage of current, "deployment" artifacts.

Multithreaded execution

Variable substitution via lazyscribe

Suppose you are building a model. Every experiment is tracked via lazyscribe

from lazyscribe import Project

project = Project("project.json", mode="w")
with project.log("my-experiment") as exp:
    exp.log_artifact(name="features", value=["a", "b", "c"], handler="json")
    ...

project.save()

Suppose you have the following function to report the outcome of this experiment:

import logging

LOG = logging.getLogger(__name__)

def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

We can use spacecadet to directly connect the artifact from our experiment to the function.

from spacecadet.threading import cadet

project = Project("project.json", mode="r")


@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

this decorator converts model_report to a lightly customized extension of threading.Thread. Let's execute this function:

thread = model_report("My model name", "features")
thread.start()
thread.join()

The logging will show that we have replaced the literal string "features" with the value ["a", "b", "c"] from the artifact with the name "features" in our experiment. If you're shipping more application-side code that needs to be flexible to the source lazyscribe experiment or repository, you can also define the source later:

# Application-side code

@cadet
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

# User-side code
thread = model_report("My model name", "features")
thread.options(project["my-experiment"])

Managing thread allocation

Additionally, you can use spacecadet to run functions in threads that require multiple resources themselves. Suppose you have a 12-core system. If each of your functions consumes 4 cores themselves, you want to limit the number of concurrent functions that are running to reduce thread contention. With spacecadet, we use a semaphore-like object to acquire and release multiple threads.

@cadet(num_threads=4)
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

thread = model_report("My model name", "features")
thread.options(source=project["my-experiment"])
thread.start()
thread.join()

By default, spacecadet will use os.cpu_count to detect the number of cores on your machine. This value will represent the total number of threads available. When spacecadet.threading.CadetThread.start is called, we will "acquire" 4 threads from the available pool. If you don't want to use os.cpu_count to determine the total number of threads, you have a few options.

  1. Specify the number of available threads through the environment variable SPACECADET_MAX_THREADS

    export SPACECADET_MAX_THREADS=12
    
  2. Use a context manager to temporarily set the number of threads:

    from spacecadet.semaphore import ThreadedSemaphore
    
    with ThreadedSemaphore(12):
        ...
    
    # outside of the context handler, os.cpu_count will be used again
    

NOTE: we have designed spacecadet.semaphore.ThreadedSemaphore as a singleton object. This means that, no matter how many times you instantiate the class, the parameters used in the first instantiation of the semaphore will be used until all references to that instance have been deleted:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> alloc = ThreadedSemaphore(1)
>>> print(alloc)
Allocator: 12 available threads

Deleting the instance will allow you to change the total available threads:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> del alloc
>>> alloc = ThreadedSemaphore(1)
Allocator: 1 available threads

Return values

The threading.Thread class famously does not return values. All we've done here is store the output of your function in the result attribute:

@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    return f"Model {model_name} uses the following features: {', '.join(features)}"

thread = model_report("My model name", "features")
thread.start()
thread.join()
print(thread.result)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacecadet-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacecadet-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file spacecadet-0.1.0.tar.gz.

File metadata

  • Download URL: spacecadet-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3236cc71f335b7497816417fcea349a6a803c2efb9b9d3c354f1afff4feabc0e
MD5 87d3eec78c936f846763324a33ab643d
BLAKE2b-256 c345ef90305295cb4670332e1084a9af6535b8552d7ab53f11ddb5416aa611de

See more details on using hashes here.

File details

Details for the file spacecadet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: spacecadet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57fb57040ac178cbdb3610f1de6c764c212f17d12ffb38890fc72eef8f9fa529
MD5 9c22fdc7ac072332c7df8754ead6c7d1
BLAKE2b-256 0dfef3a227eaa6c672c17471ec83f238c9d71a939d34a7b2c55642fc84bb2ba8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page