Skip to main content

A parallel execution engine that doesn't know anything about serialization

Project description

License PyPI PyPI - Python Version Ruff ty

Parallel execution using lazyscribe

spacecadet is a library designed to enable parallel code execution without defining custom serialization. Instead, we leverage lazyscribe to track and manage experimentation as well as usage of current, "deployment" artifacts.

Multithreaded execution

Variable substitution via lazyscribe

Suppose you are building a model. Every experiment is tracked via lazyscribe

from lazyscribe import Project

project = Project("project.json", mode="w")
with project.log("my-experiment") as exp:
    exp.log_artifact(name="features", value=["a", "b", "c"], handler="json")
    ...

project.save()

Suppose you have the following function to report the outcome of this experiment:

import logging

LOG = logging.getLogger(__name__)

def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

We can use spacecadet to directly connect the artifact from our experiment to the function.

from spacecadet.threading import cadet

project = Project("project.json", mode="r")


@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

this decorator converts model_report to a lightly customized extension of threading.Thread. Let's execute this function:

thread = model_report("My model name", "features")
thread.start()
thread.join()

The logging will show that we have replaced the literal string "features" with the value ["a", "b", "c"] from the artifact with the name "features" in our experiment. If you're shipping more application-side code that needs to be flexible to the source lazyscribe experiment or repository, you can also define the source later:

# Application-side code

@cadet
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

# User-side code
thread = model_report("My model name", "features")
thread.options(project["my-experiment"])

Managing thread allocation

Additionally, you can use spacecadet to run functions in threads that require multiple resources themselves. Suppose you have a 12-core system. If each of your functions consumes 4 cores themselves, you want to limit the number of concurrent functions that are running to reduce thread contention. With spacecadet, we use a semaphore-like object to acquire and release multiple threads.

@cadet(num_threads=4)
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

thread = model_report("My model name", "features")
thread.options(source=project["my-experiment"])
thread.start()
thread.join()

By default, spacecadet will use os.cpu_count to detect the number of cores on your machine. This value will represent the total number of threads available. When spacecadet.threading.CadetThread.start is called, we will "acquire" 4 threads from the available pool. If you don't want to use os.cpu_count to determine the total number of threads, you have a few options.

  1. Specify the number of available threads through the environment variable SPACECADET_MAX_THREADS

    export SPACECADET_MAX_THREADS=12
    
  2. Use a context manager to temporarily set the number of threads:

    from spacecadet.semaphore import ThreadedSemaphore
    
    with ThreadedSemaphore(12):
        ...
    
    # outside of the context handler, os.cpu_count will be used again
    

NOTE: we have designed spacecadet.semaphore.ThreadedSemaphore as a singleton object. This means that, no matter how many times you instantiate the class, the parameters used in the first instantiation of the semaphore will be used until all references to that instance have been deleted:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> alloc = ThreadedSemaphore(1)
>>> print(alloc)
Allocator: 12 available threads

Deleting the instance will allow you to change the total available threads:

>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> del alloc
>>> alloc = ThreadedSemaphore(1)
Allocator: 1 available threads

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacecadet-0.2.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacecadet-0.2.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file spacecadet-0.2.0.tar.gz.

File metadata

  • Download URL: spacecadet-0.2.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2ec1e1452faddff9367bd1ed0d8b2c4ddc5f58ceab375aa6add7cd5f3d1a6e12
MD5 9a2775e10cb65d976f606a5df465f61a
BLAKE2b-256 da88774bf147dfb8ac4a4906c0b0f10cc4cb02f15f36251224b539c284560671

See more details on using hashes here.

File details

Details for the file spacecadet-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: spacecadet-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for spacecadet-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2efbfa58460cf830e578c1897f330574765060c0e0f984dc316725e67f55804e
MD5 2adbccdf58262bebff51de30c13af6a2
BLAKE2b-256 3b1cbbfe93ab2e882a77c650b0f142757adecb1e51a652e1971ea601f779e6ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page