Run testbeds as isolated pods in a Kubernetes cluster

These details have not been verified by PyPI

Project links

Project description

Moatless Testbeds

Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git patches and run tests or SWE-Bench evaluations.

While initially tested with SWE-Bench's docker containerization solution, it supports any Docker image that meets the basic requirements:

Contains a git repository in the /testbeds directory for applying patches
Supports running tests with specific commands (e.g., pytest [path to test file])

Fill out this form if you’re interested in testing the hosted version of Moatless Testbeds.

Getting Started

Initialize the SDK

pip install moatless-testbeds

Run tests

from testbeds.sdk import TestbedSDK

# Initialize the SDK with your credentials
sdk = TestbedSDK(
    base_url="https://testbeds.moatless.ai",  # Replace with your API URL
    api_key="<API-KEY>"
)

# Create a testbed instance and automatically handle cleanup
with sdk.create_client(instance_id="django__django-11333") as testbed:
    # Define test files to run
    test_files = [
        "tests/test_forms.py",
        "tests/test_models.py"
    ]

    # Example patch fixing a bug
    patch = """
diff --git a/django/forms/models.py b/django/forms/models.py
index abc123..def456 100644
--- a/django/forms/models.py
+++ b/django/forms/models.py
@@ -245,7 +245,7 @@ class BaseModelForm(BaseForm):
-        if self.instance and not self.instance._state.adding:
+        if self.instance and not self.instance._state.adding and not self._meta.fields:
             self._meta.fields = None
    """

    # Run the tests and get results
    result = testbed.run_tests(
        test_files=test_files,
        patch=patch
    )
    print(f"Test Status: {result.get_summary()}")

Installation

Prerequisites

Docker installed and configured
kubectl configured with access to your Kubernetes cluster
envsubst utility installed

Installation Steps

The easiest way to install is using the provided install script:

# Clone the repository
git clone https://github.com/aorwall/moatless-testbeds.git
cd moatless-testbeds

# Install Testbeds SDK
pip install moatless-testbeds

# Set the Kubernetes namespace if not default
# export KUBERNETES_NAMESPACE=testbeds  # default: testbeds

# Optional: Configure custom container registry and image prefix
# If not set, will use default values for SWE-bench images
# export SWEBENCH_DOCKER_REGISTRY=your-registry  # default: swebench
# export SWEBENCH_IMAGE_PREFIX=your-prefix      # default: sweb.eval.x86_64.

# Optional: Enable direct command execution in testbeds
# Warning: This allows arbitrary command execution and should be used with caution
# export ENABLE_EXEC=true  # default: false

# Run the install script
./scripts/install.sh

The API will be available at http://<EXTERNAL-IP>.

Container Registry Configuration

The testbed images are pulled from a container registry that can be configured using two environment variables:

SWEBENCH_DOCKER_REGISTRY: The base registry URL (default: swebench)
SWEBENCH_IMAGE_PREFIX: The prefix for testbed images (default: sweb.eval.x86_64.)

By default, the configuration is set up to use SWE-bench images. If you want to use your own registry:

export SWEBENCH_DOCKER_REGISTRY=my-registry.azurecr.io
export SWEBENCH_IMAGE_PREFIX=custom.eval.

This will result in testbed images being pulled from: my-registry.azurecr.io/custom.eval.<instance-id>

Run evaluation

The evaluation script allows you to test gold patches and verify that your setup is working correctly.

Prerequisites

Make sure you have the following environment variables set:

TESTBED_API_IP: The IP address of your API service
NAMESPACE: The Kubernetes namespace where the API is deployed (default: testbeds)
TESTBED_API_KEY: Your API key (if API key authentication is enabled)

You can source these from the installation:

source .env.testbed

Running Evaluation

To run an evaluation:

python scripts/run_evaluation.py --instance-id <instance-id>

For example:

python scripts/run_evaluation.py --instance-id django__django-11333

The script will:

Create a new testbed instance
Run the evaluation using the specified instance ID with the gold patch
Output the evaluation results in JSON format
Clean up the testbed instance

A successful run will show "✅ Evaluation completed successfully!" in the logs. Any errors during execution will be logged with detailed information.

Architecture

The solution consists of three core components:

1. Orchestrating API

Deployed as a central service in the Kubernetes cluster
Manages testbed jobs and pods lifecycle
Provides endpoints for command execution in testbeds
Handles pod creation and deletion

2. Testbeds

Testbeds are composed of two parts:

Main Testbed Image: Contains the test environment and code
Sidecar Container: Exposes a simple HTTP API with four endpoints:
- Command execution
- File management (save/retrieve)
- Status polling

The command execution flow is straightforward:

Send command via POST /exec
Poll status via GET /exec until completion

3. SDK

The SDK provides a simple interface to interact with the API. It handles:

Testbed creation and management
Command execution
Test running and evaluation

Test Execution Flow

Start or reset testbed (recommended: new testbed for each test run)
Apply code changes as git patches
Run tests using specified commands
Parse test output into TestResult objects
Generate evaluation reports comparing against FAIL_TO_PASS and PASS_TO_PASS tests

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.17

Mar 17, 2025

0.0.16

Feb 27, 2025

0.0.15

Jan 30, 2025

This version

0.0.14

Jan 18, 2025

0.0.13

Jan 18, 2025

0.0.12

Jan 17, 2025

0.0.11

Jan 14, 2025

0.0.10

Nov 18, 2024

0.0.9

Nov 5, 2024

0.0.8

Nov 4, 2024

0.0.7

Nov 3, 2024

0.0.6

Nov 3, 2024

0.0.5

Oct 31, 2024

0.0.3

Oct 30, 2024

0.0.2

Oct 30, 2024

0.0.1

Oct 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moatless_testbeds-0.0.14.tar.gz (48.1 kB view details)

Uploaded Jan 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

moatless_testbeds-0.0.14-py3-none-any.whl (57.0 kB view details)

Uploaded Jan 18, 2025 Python 3

File details

Details for the file moatless_testbeds-0.0.14.tar.gz.

File metadata

Download URL: moatless_testbeds-0.0.14.tar.gz
Upload date: Jan 18, 2025
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.9-arch1-1

File hashes

Hashes for moatless_testbeds-0.0.14.tar.gz
Algorithm	Hash digest
SHA256	`f06fc8c7d338871550c0faed0cca8417965bae944bf31cc80eb9a28efcc03641`
MD5	`cfe6242a1e295fa42653468f35e8870d`
BLAKE2b-256	`5ca1cd9da5de4f2a629012eba2f1d9d97796ed2331e7cfc17d34180972ac5a35`

See more details on using hashes here.

File details

Details for the file moatless_testbeds-0.0.14-py3-none-any.whl.

File metadata

Download URL: moatless_testbeds-0.0.14-py3-none-any.whl
Upload date: Jan 18, 2025
Size: 57.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.9-arch1-1

File hashes

Hashes for moatless_testbeds-0.0.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c89113525ff44a9e919e4fd30456ff1a2a4c15d41350f67516b74f5cbb0084e`
MD5	`8464657be73d2c39033b0f98c8ddea0b`
BLAKE2b-256	`9051d97fe1a7275844074e0ebe2331a4dedf8c30d93a0f7fc08ba4b7fe4f3916`

See more details on using hashes here.

moatless-testbeds 0.0.14

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Moatless Testbeds

Getting Started

Initialize the SDK

Run tests

Installation

Prerequisites

Installation Steps

Container Registry Configuration

Run evaluation

Prerequisites

Running Evaluation

Architecture

1. Orchestrating API

2. Testbeds

3. SDK

Test Execution Flow

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes