A subprocess-based evaluator for executing and measuring program performance in isolated subprocesses

These details have not been verified by PyPI

Project description

Swarmauri Logo

Swarmauri Evaluator Subprocess

SubprocessEvaluator executes programs inside sandboxed subprocesses while enforcing CPU, memory, and file size quotas. It captures stdout/stderr, tracks exit codes, and returns a normalized score plus structured metadata describing each run.

Highlights

Apply CPU timeouts, memory ceilings, file size limits, and process count caps before user code starts (resource.setrlimit).
Automatically choose the appropriate command: launch executables directly or wrap Python and shell scripts with the correct interpreter.
Compare stdout against an expected_output string and annotate mismatches in the result metadata.
Record execution context (command, args, working_dir) alongside collected streams for easy debugging.
Aggregate multiple runs with reason counts, timeout rates, and success rates via aggregate_scores.

Installation

Pick the tool that matches your workflow:

# pip
pip install swarmauri_evaluator_subprocess

# Poetry
poetry add swarmauri_evaluator_subprocess

# uv
uv add swarmauri_evaluator_subprocess

Quickstart

The example below writes a temporary Python script to disk, wraps it in a small IProgram implementation, and evaluates it inside a subprocess. The evaluator returns 1.0 when the exit code is in success_exit_codes and, when provided, stdout matches expected_output.

from pathlib import Path
import tempfile

from swarmauri_evaluator_subprocess import SubprocessEvaluator
from swarmauri_core.programs.IProgram import DiffType, IProgram


class ScriptProgram(IProgram):
    """Minimal IProgram wrapper for a script stored on disk."""

    def __init__(self, path: Path):
        self._path = Path(path)

    # Required IProgram interface methods -------------------------------
    def diff(self, other: IProgram) -> DiffType:  # pragma: no cover - example
        return {}

    def apply_diff(self, diff: DiffType) -> "ScriptProgram":  # pragma: no cover
        return ScriptProgram(self._path)

    def validate(self) -> bool:  # pragma: no cover
        return self._path.exists()

    def clone(self) -> "ScriptProgram":  # pragma: no cover
        return ScriptProgram(self._path)

    # Methods consumed by SubprocessEvaluator ---------------------------
    def get_path(self) -> str:
        return str(self._path)

    def is_executable(self) -> bool:
        return False


def run_example(expected_output: str = "hello from subprocess\n"):
    evaluator = SubprocessEvaluator(timeout=5)

    with tempfile.TemporaryDirectory() as tmpdir:
        script_path = Path(tmpdir) / "echo.py"
        script_path.write_text("print('hello from subprocess')\n", encoding="utf-8")

        program = ScriptProgram(script_path)

        score, metadata = evaluator.evaluate(
            program,
            expected_output=expected_output,
        )

    return score, metadata


def main():
    score, metadata = run_example()
    print("Score:", score)
    print("Stdout:", metadata["stdout"].strip())
    print("Reason:", metadata["reason"])


if __name__ == "__main__":
    main()

Evaluation options

SubprocessEvaluator.evaluate(program, **kwargs) accepts runtime controls in addition to the evaluator's model fields:

Argument	Description
`args`	List of command-line arguments appended to the prepared command.
`input_data`	String provided on stdin; useful for feeding sample input.
`expected_output`	Optional stdout string; mismatches lower the score to `0.7`.
`timeout`	Overrides the evaluator's `timeout` for a single run.

Returned metadata

Each evaluation returns (score, metadata) where metadata always contains:

stdout, stderr, and exit_code from the subprocess.
timed_out flag plus a human-readable reason such as success, timeout, or exit_code_<value>.
command, args, and working_dir to show how the program was launched.
execution_time (seconds) measured by the evaluator wrapper.

When aggregating multiple runs, aggregate_scores adds reason_counts, timeout_rate, success_rate, and total_executions to the combined metadata so callers can evaluate fleet-wide behavior.

Want to help?

If you want to contribute to swarmauri-sdk, read up on our guidelines for contributing that will help you get started.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.11.0.dev1 pre-release

Jun 30, 2026

0.3.0.dev48 pre-release

Mar 23, 2026

0.3.0.dev47 pre-release

Mar 20, 2026

0.3.0.dev46 pre-release

Mar 20, 2026

0.3.0.dev45 pre-release

Mar 20, 2026

0.3.0.dev44 pre-release

Mar 20, 2026

0.3.0.dev41 pre-release

Mar 5, 2026

This version

0.3.0.dev38 pre-release

Feb 23, 2026

0.3.0.dev34 pre-release

Feb 17, 2026

0.3.0.dev33 pre-release

Feb 17, 2026

0.3.0.dev4 pre-release

Sep 11, 2025

0.3.0.dev3 pre-release

Sep 10, 2025

0.3.0.dev2 pre-release

Sep 10, 2025

0.1.2

May 23, 2025

0.1.2.dev1 pre-release

May 23, 2025

0.1.1

May 23, 2025

0.1.1.dev3 pre-release

May 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_evaluator_subprocess-0.3.0.dev38.tar.gz (11.3 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swarmauri_evaluator_subprocess-0.3.0.dev38-py3-none-any.whl (12.7 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file swarmauri_evaluator_subprocess-0.3.0.dev38.tar.gz.

File metadata

Download URL: swarmauri_evaluator_subprocess-0.3.0.dev38.tar.gz
Upload date: Feb 23, 2026
Size: 11.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_evaluator_subprocess-0.3.0.dev38.tar.gz
Algorithm	Hash digest
SHA256	`5956c535e21af5dfcadb18dfc0362924be444ba7fd08c948e7cc00b6772fce82`
MD5	`9ad2fbdef5673bf2b0faf89e109c03dd`
BLAKE2b-256	`23adf8fb516c70799c798ebb245fe6d537447287070f5996f9a8264f0f4380b4`

See more details on using hashes here.

File details

Details for the file swarmauri_evaluator_subprocess-0.3.0.dev38-py3-none-any.whl.

File metadata

Download URL: swarmauri_evaluator_subprocess-0.3.0.dev38-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 12.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_evaluator_subprocess-0.3.0.dev38-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e599278f04a9189fce769f20cef396cf86dc4bc178c1c74f7180d4709fdfff3`
MD5	`b70c3fc9c686c35486d9e07612782884`
BLAKE2b-256	`cd083fc3716c1c0e763fe66f7fe7c10061e1f807d4a862f8a4f393ed83379657`

See more details on using hashes here.

swarmauri_evaluator_subprocess 0.3.0.dev38

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Swarmauri Evaluator Subprocess

Highlights

Installation

Quickstart

Evaluation options

Returned metadata

Want to help?

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes