Skip to main content

Vector-based diverse test case selection for Robot Framework and pytest

Project description

robotframework-testselection

Vector-based diverse test case selection for Robot Framework and pytest. Embeds test cases as semantic vectors and selects maximally diverse subsets to reduce test suite execution time while preserving coverage breadth.

How It Works

The system operates as a 3-stage pipeline:

Vectorize              Select                Execute
(.robot files) ──►  (embeddings.npz   ──►  (selected_tests.json
                     test_manifest.json)     + robot --prerunmodifier)
  1. Vectorize — Parses .robot files via the Robot Framework API, converts each test case to a natural-language text representation (name + tags + resolved keyword tree), then encodes with all-MiniLM-L6-v2 (384-dim sentence embeddings).
  2. Select — Loads embedding vectors and applies a diversity-maximizing selection algorithm (default: Farthest Point Sampling) to choose k tests that are as semantically different from each other as possible.
  3. Execute — Runs the selected tests via Robot Framework using a PreRunModifier (for standard tests) and a Listener v3 (for DataDriver-generated tests).

If any stage fails, the pipeline gracefully degrades by running all tests (exit code 2).

Installation

# From PyPI
pip install robotframework-testselection

# With sentence-transformers for vectorization
pip install robotframework-testselection[vectorize]

# With all optional selection algorithms
pip install robotframework-testselection[selection-extras]

# Everything
pip install robotframework-testselection[all]

Development Install (from source)

Requires Python 3.10+ and uv.

uv sync --extra vectorize
uv sync --extra all

The pytest plugin is included in the base install and activates when you pass --diverse-k to pytest.

Dependency Groups

Group Packages Purpose
(base) robotframework, numpy, scikit-learn Core pipeline
vectorize sentence-transformers Embedding model (Stage 1)
selection-extras scikit-learn-extra, dppy, apricot-select k-Medoids, DPP, Facility Location strategies
chromadb chromadb Alternative vector storage
dev pytest, pytest-cov, pytest-benchmark, ruff, mypy Development

Quick Start

Full Pipeline (one command)

testcase-select run \
  --suite tests/robot/ \
  --k 20 \
  --strategy fps \
  --seed 42 \
  --output-dir ./results/

Stage-by-Stage

# Stage 1: Vectorize
testcase-select vectorize \
  --suite tests/robot/ \
  --output ./artifacts/ \
  --model all-MiniLM-L6-v2 \
  --resolve-depth 2

# Stage 2: Select
testcase-select select \
  --artifacts ./artifacts/ \
  --k 20 \
  --strategy fps \
  --seed 42 \
  --output selected_tests.json

# Stage 3: Execute
testcase-select execute \
  --suite tests/robot/ \
  --selection selected_tests.json \
  --output-dir ./results/

With DataDriver CSV Files

testcase-select vectorize \
  --suite tests/robot/ \
  --output ./artifacts/ \
  --datadriver-csv tests/data/login.csv tests/data/search.csv

Passing Robot Framework Options

All arguments after -- are forwarded directly to robot. This lets you combine diversity selection with any Robot Framework option — variables, tag filters, log levels, listeners, etc.

# Pass variables and set log level
testcase-select run --suite tests/ --k 20 \
  -- --variable ENV:staging --variable USER:admin --loglevel DEBUG

# Use Robot's own tag filtering on top of diversity selection
testcase-select execute --suite tests/ --selection sel.json \
  -- --include smoke --exclude manual

# Add metadata and custom output name
testcase-select run --suite tests/ --k 30 \
  -- --metadata Version:2.1 --name "Smoke Regression"

# Set variables file and debug log
testcase-select run --suite tests/ --k 50 \
  -- --variablefile config/env_staging.py --debugfile debug.log

This works with both run and execute subcommands, including during graceful fallback (when selection fails and all tests are run).

pytest Support

The diversity selection algorithm also works with pytest test suites, via a pytest plugin:

# Select 20 most diverse tests
pytest --diverse-k=20 tests/

# With custom strategy and seed
pytest --diverse-k=30 --diverse-strategy=fps_multi --diverse-seed=123 tests/

# Filter by markers before selection
pytest --diverse-k=20 --diverse-include-markers slow integration tests/

# Group parametrized tests (select at group level)
pytest --diverse-k=20 --diverse-group-parametrize tests/

Or via the testcase-select CLI:

# Full pipeline with pytest
testcase-select run --framework pytest --suite tests/ --k 20 --strategy fps

# Stage-by-stage
testcase-select vectorize --framework pytest --suite tests/ --output ./artifacts/
testcase-select select --artifacts ./artifacts/ --k 20
testcase-select execute --framework pytest --suite tests/ --selection selected_tests.json

The pytest plugin is installed automatically and activated by --diverse-k. It uses AST-based text representation with sentence embeddings for test diversity analysis.

Direct Robot Framework Integration

You can also use the components directly with robot:

# PreRunModifier for standard tests
robot --prerunmodifier TestSelection.execution.prerun_modifier.DiversePreRunModifier:selected_tests.json tests/

# Listener v3 for DataDriver tests
robot --listener TestSelection.execution.listener.DiverseDataDriverListener:selected_tests.json tests/

# Both together
robot \
  --prerunmodifier TestSelection.execution.prerun_modifier.DiversePreRunModifier:selected_tests.json \
  --listener TestSelection.execution.listener.DiverseDataDriverListener:selected_tests.json \
  tests/

Environment Variables

Variable Default Description
DIVERSE_K 50 Number of tests to select
DIVERSE_STRATEGY fps Selection algorithm
DIVERSE_SEED 42 Random seed for reproducibility
DIVERSE_OUTPUT (none) Output file path for selection JSON

Selection Strategies

Strategy Name Dependencies Description
FPS fps (base) Farthest Point Sampling. Greedy farthest-first traversal. O(Nkd). 2-approximation guarantee for max-min dispersion. Default.
FPS Multi-Start fps_multi (base) Runs FPS from multiple random starting points, keeps the result with the highest minimum pairwise distance. Mitigates initial-point sensitivity.
k-Medoids kmedoids selection-extras PAM algorithm for medoid-based clustering. Better centroid representativeness.
DPP dpp selection-extras Determinantal Point Process. Probabilistic repulsion-based sampling.
Facility Location facility selection-extras Submodular facility location maximization. Optimizes for both diversity and representativeness.

Tag Filtering

Filter tests by tags before selection:

# Include only smoke and regression tests
testcase-select select \
  --artifacts ./artifacts/ \
  --k 20 \
  --include-tags smoke regression

# Exclude slow tests
testcase-select select \
  --artifacts ./artifacts/ \
  --k 20 \
  --exclude-tags slow manual

# Exclude DataDriver tests
testcase-select select \
  --artifacts ./artifacts/ \
  --k 20 \
  --no-datadriver

Caching

Stage 1 uses content-hash caching. It computes MD5 hashes of all .robot and .csv files and stores them alongside artifacts. On subsequent runs, vectorization is skipped when no source files have changed. Use --force to bypass the cache:

testcase-select vectorize --suite tests/ --output ./artifacts/ --force

Artifacts

The pipeline produces these artifacts:

File Stage Format Contents
embeddings.npz Vectorize NumPy compressed N x 384 float32 matrix
test_manifest.json Vectorize JSON Test metadata (names, tags, suites, IDs)
file_hashes.json Vectorize JSON Source file MD5 hashes for cache invalidation
selected_tests.json Select JSON Selected test list + diversity metrics

Selection Output Format

{
  "strategy": "fps",
  "k": 20,
  "seed": 42,
  "total_tests": 150,
  "filtered_tests": 120,
  "selected": [
    {
      "name": "Login With Valid Credentials",
      "id": "a1b2c3d4...",
      "suite": "tests/login.robot",
      "is_datadriver": false
    }
  ],
  "diversity_metrics": {
    "avg_pairwise_distance": 0.8234,
    "min_pairwise_distance": 0.4512,
    "suite_coverage": 8,
    "suite_total": 10
  }
}

CI/CD Integration

Pre-built configurations are provided in config/:

GitHub Actions

# .github/workflows/diverse-tests.yml
# Copy from config/github-actions.yml

Features:

  • 3-job pipeline with artifact transfer between stages
  • Content-hash caching (actions/cache@v4) to skip vectorization
  • Automatic PR annotations with selection summary
  • workflow_dispatch for manual runs with custom k/strategy

GitLab CI

# .gitlab-ci.yml
# Copy from config/gitlab-ci.yml

Jenkins

// Jenkinsfile
// Copy from config/Jenkinsfile

Project Structure

src/TestSelection/
  shared/              # Shared kernel (types, config)
    types.py           # TestCaseId, Tag, TestCaseRecord, KeywordTree, ...
    config.py          # PipelineConfig, TextBuilderConfig
  parsing/             # Bounded context: Robot Framework parsing
    suite_collector.py # RobotApiAdapter (TestSuite.from_file_system)
    keyword_resolver.py# Recursive keyword tree resolution
    text_builder.py    # Natural language text representation
    datadriver_reader.py # DataDriver CSV reader
  embedding/           # Bounded context: vector embedding
    ports.py           # EmbeddingModel protocol
    embedder.py        # SentenceTransformerAdapter (ACL)
    models.py          # EmbeddingMatrix aggregate, ManifestEntry
  selection/           # Bounded context: diversity selection
    strategy.py        # SelectionStrategy protocol, SelectionResult
    fps.py             # FarthestPointSampling, FPSMultiStart
    kmedoids.py        # KMedoidsSelection (optional)
    dpp.py             # DPPSelection (optional)
    facility.py        # FacilityLocationSelection (optional)
    registry.py        # StrategyRegistry with auto-discovery
    filtering.py       # Tag-based pre-selection filtering
  execution/           # Bounded context: Robot Framework execution
    prerun_modifier.py # DiversePreRunModifier (SuiteVisitor)
    listener.py        # DiverseDataDriverListener (Listener v3)
    runner.py          # ExecutionRunner
  pytest/              # pytest integration
    plugin.py          # pytest plugin (pytest11 entry point)
    collector.py       # Programmatic test collection
    text_builder.py    # AST-based text representation
    runner.py          # pytest execution
  pipeline/            # Orchestration layer
    vectorize.py       # Stage 1 orchestrator
    select.py          # Stage 2 orchestrator
    execute.py         # Stage 3 orchestrator
    cache.py           # Content-hash cache invalidator
    artifacts.py       # Artifact storage and validation
    errors.py          # Domain error hierarchy
  cli.py               # CLI entry point (argparse)

tests/
  fixtures/            # Sample .robot and .csv files
  unit/                # Unit tests
  integration/         # Integration tests
  benchmarks/          # Performance benchmarks

Development

# Install with dev dependencies
uv sync --extra dev

# Run all tests
uv run pytest tests/ -v --benchmark-disable

# Run unit tests only
uv run pytest tests/unit/ -v

# Run integration tests only
uv run pytest tests/integration/ -v

# Run benchmarks
uv run pytest tests/benchmarks/ -v

# Lint
uv run ruff check src/

# Type check
uv run mypy src/

Test Markers

# Skip slow tests (ML model loading)
uv run pytest -m "not slow"

# Only integration tests
uv run pytest -m integration

# Only benchmarks
uv run pytest -m benchmark

Coverage Retention Analysis

Measured on robotframework-doctestlibrary (5,045 statements in the DocTest package). All strategies use all-MiniLM-L6-v2 embeddings (384-dim), seed=42.

pytest (594 tests, 70.1% full coverage)

Strategy Selection Tests Coverage Retention Speedup
(full suite) 100% 594 70.1% baseline 1.0x
fps 50% 297 65.9% 93.9% 3.4x
dpp 50% 297 65.9% 93.9% 3.5x
kmedoids 50% 297 65.8% 93.8% 3.1x
fps_multi 50% 297 65.7% 93.7% 3.0x
facility 50% 297 64.9% 92.6% 2.9x
random 50% 297 64.2% 91.5% 2.3x
facility 20% 119 58.6% 83.6% 7.5x
kmedoids 20% 119 58.3% 83.1% 6.7x
fps 20% 119 56.1% 80.0% 6.7x
fps_multi 20% 119 53.8% 76.8% 8.0x
random 20% 119 53.2% 75.8% 5.4x
dpp 20% 119 47.6% 67.8% 6.0x
fps_multi 10% 60 47.7% 67.9% 18.5x
kmedoids 10% 60 46.9% 66.9% 14.9x
facility 10% 60 46.8% 66.8% 11.5x
random 10% 60 44.3% 63.1% 10.3x
dpp 10% 60 42.9% 61.1% 13.7x
fps 10% 60 40.6% 57.9% 21.4x

Robot Framework (126 tests, 63.5% full coverage)

Strategy Selection Tests Coverage Retention Speedup
(full suite) 100% 126 63.5% baseline 1.0x
fps_multi 50% 63 61.6% 97.0% 1.5x
facility 50% 63 61.6% 96.9% 1.5x
fps 50% 63 61.2% 96.4% 1.6x
kmedoids 50% 63 60.5% 95.3% 1.5x
dpp 50% 63 58.5% 92.0% 1.7x
random 50% 63 57.2% 90.1% 2.3x
facility 20% 26 51.8% 81.5% 4.4x
kmedoids 20% 26 47.5% 74.8% 5.5x
random 20% 26 46.0% 72.5% 5.7x
fps 20% 26 45.3% 71.4% 3.1x
fps_multi 20% 26 45.3% 71.4% 2.9x
dpp 20% 26 43.9% 69.0% 4.6x
kmedoids 10% 13 42.3% 66.6% 11.3x
dpp 10% 13 40.9% 64.3% 10.7x
facility 10% 13 40.2% 63.3% 12.9x
random 10% 13 37.6% 59.2% 11.3x
fps_multi 10% 13 37.5% 59.1% 10.7x
fps 10% 13 33.2% 52.3% 8.3x

Key Findings

  • 50% selection retains 92-97% of coverage across all strategies — the diversity algorithms select tests maximizing behavioral breadth, with minimal redundancy loss
  • Facility location excels at 20% selection — 83.6% retention on pytest, 81.5% on Robot Framework, making it the best choice for aggressive smoke-test suites
  • kmedoids matches facility at 20% on pytest (83.1% retention) and leads at 10% on Robot Framework (66.6%)
  • fps_multi leads at 10% pytest (67.9% retention) where initial-point sensitivity matters most
  • DPP performs well at 50% (93.9% retention, matching FPS) but drops at 20% due to FPS fallback for large k values
  • All diversity strategies outperform random at 50% by 2-7 percentage points of retention
  • At 50% robot selection, fps_multi achieves 97% retention — meaning half the tests contribute almost no incremental coverage

Performance

Benchmarked on 384-dimensional normalized vectors:

N (tests) FPS Time FPS Multi (5 starts)
100 12 ms
500 99 ms
1,000 304 ms 376 ms (3 starts)
5,000 8.0 s

Keyword resolution: ~18 us/depth-1 resolve, ~61 us/depth-3 resolve. Text building for 100 tests: ~458 us.

Architecture Decision Records

ADR Title
ADR-001 3-Stage Pipeline Architecture
ADR-002 Embedding Model and Storage
ADR-003 Selection Algorithm Strategy Pattern
ADR-004 Robot Framework Integration
ADR-005 Project Structure and uv
ADR-006 Text Representation Strategy
ADR-007 CI/CD Integration and Caching
ADR-008 Reliability and Observability

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robotframework_testselection-0.3.0.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robotframework_testselection-0.3.0-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file robotframework_testselection-0.3.0.tar.gz.

File metadata

  • Download URL: robotframework_testselection-0.3.0.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for robotframework_testselection-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e13587bae8f9799472b09c8df8d42cba90bbe2f8a801644e150a9b1418451e73
MD5 429e7f8dc7d7e049fcaafbc549b38e71
BLAKE2b-256 c4a08d5d83408dc72be2293916d601c41677cd024b236ae08800d355b2e0ea3a

See more details on using hashes here.

File details

Details for the file robotframework_testselection-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: robotframework_testselection-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for robotframework_testselection-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd642ee65cec433170e039703048443e2a7c5331238d6e031d14e06ae46a8b84
MD5 d05b1b4deb079d5c4ea27924a73cb504
BLAKE2b-256 ff184b5ba5c6d109c85c05f2df3ee7ec6d4cbea2c37bf2d639f6607f9de462d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page