A subprocess-based evaluator for executing and measuring program performance in isolated subprocesses
Project description
Swarmauri Evaluator Subprocess
SubprocessEvaluator executes programs inside sandboxed subprocesses while
enforcing CPU, memory, and file size quotas. It captures stdout/stderr, tracks
exit codes, and returns a normalized score plus structured metadata describing
each run.
Highlights
- Apply CPU timeouts, memory ceilings, file size limits, and process count caps
before user code starts (
resource.setrlimit). - Automatically choose the appropriate command: launch executables directly or wrap Python and shell scripts with the correct interpreter.
- Compare stdout against an
expected_outputstring and annotate mismatches in the result metadata. - Record execution context (
command,args,working_dir) alongside collected streams for easy debugging. - Aggregate multiple runs with reason counts, timeout rates, and success rates
via
aggregate_scores.
Installation
Pick the tool that matches your workflow:
# pip
pip install swarmauri_evaluator_subprocess
# Poetry
poetry add swarmauri_evaluator_subprocess
# uv
uv add swarmauri_evaluator_subprocess
Quickstart
The example below writes a temporary Python script to disk, wraps it in a small
IProgram implementation, and evaluates it inside a subprocess. The evaluator
returns 1.0 when the exit code is in success_exit_codes and, when provided,
stdout matches expected_output.
from pathlib import Path
import tempfile
from swarmauri_evaluator_subprocess import SubprocessEvaluator
from swarmauri_core.programs.IProgram import DiffType, IProgram
class ScriptProgram(IProgram):
"""Minimal IProgram wrapper for a script stored on disk."""
def __init__(self, path: Path):
self._path = Path(path)
# Required IProgram interface methods -------------------------------
def diff(self, other: IProgram) -> DiffType: # pragma: no cover - example
return {}
def apply_diff(self, diff: DiffType) -> "ScriptProgram": # pragma: no cover
return ScriptProgram(self._path)
def validate(self) -> bool: # pragma: no cover
return self._path.exists()
def clone(self) -> "ScriptProgram": # pragma: no cover
return ScriptProgram(self._path)
# Methods consumed by SubprocessEvaluator ---------------------------
def get_path(self) -> str:
return str(self._path)
def is_executable(self) -> bool:
return False
def run_example(expected_output: str = "hello from subprocess\n"):
evaluator = SubprocessEvaluator(timeout=5)
with tempfile.TemporaryDirectory() as tmpdir:
script_path = Path(tmpdir) / "echo.py"
script_path.write_text("print('hello from subprocess')\n", encoding="utf-8")
program = ScriptProgram(script_path)
score, metadata = evaluator.evaluate(
program,
expected_output=expected_output,
)
return score, metadata
def main():
score, metadata = run_example()
print("Score:", score)
print("Stdout:", metadata["stdout"].strip())
print("Reason:", metadata["reason"])
if __name__ == "__main__":
main()
Evaluation options
SubprocessEvaluator.evaluate(program, **kwargs) accepts runtime controls in
addition to the evaluator's model fields:
| Argument | Description |
|---|---|
args |
List of command-line arguments appended to the prepared command. |
input_data |
String provided on stdin; useful for feeding sample input. |
expected_output |
Optional stdout string; mismatches lower the score to 0.7. |
timeout |
Overrides the evaluator's timeout for a single run. |
Returned metadata
Each evaluation returns (score, metadata) where metadata always contains:
stdout,stderr, andexit_codefrom the subprocess.timed_outflag plus a human-readablereasonsuch assuccess,timeout, orexit_code_<value>.command,args, andworking_dirto show how the program was launched.execution_time(seconds) measured by the evaluator wrapper.
When aggregating multiple runs, aggregate_scores adds reason_counts,
timeout_rate, success_rate, and total_executions to the combined
metadata so callers can evaluate fleet-wide behavior.
Want to help?
If you want to contribute to swarmauri-sdk, read up on our guidelines for contributing that will help you get started.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_evaluator_subprocess-0.3.0.dev46.tar.gz.
File metadata
- Download URL: swarmauri_evaluator_subprocess-0.3.0.dev46.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b757effbfd99e58cceafde5470e1b59fd3caa225f6f9047bf318783a21c5f801
|
|
| MD5 |
270c84ba3853da48028f4e5898f8e06b
|
|
| BLAKE2b-256 |
c8c45adb9c3706bce1b012147e3db72c82b883da1649a4bbc6c94a8ebc744141
|
File details
Details for the file swarmauri_evaluator_subprocess-0.3.0.dev46-py3-none-any.whl.
File metadata
- Download URL: swarmauri_evaluator_subprocess-0.3.0.dev46-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e62096f207eed868c7a441aa799c798a79e04fcb833fac255b8466661577ac6
|
|
| MD5 |
3e79e72c5f4cd331da80848fbd4ac03e
|
|
| BLAKE2b-256 |
931eaca73feb8ad89bd6ab4521e13a68f78d8f1b138cd6f79d610564021c77af
|