Robot Framework-based test harness for systematically testing LLMs
Project description
robotframework-chat
A Robot Framework-based test harness for systematically testing Large Language Models (LLMs) using LLMs as both the system under test and as automated graders. Test results are archived to SQL and visualized in Apache Superset dashboards.
Quick Start
Prerequisites
- Python 3.11+ and astral-uv for dependency management
- Docker for containerized code execution, LLM testing, and the Superset stack
- Ollama (optional) for local LLM testing
Installation (Linux / macOS)
make install # Install all dependencies
pre-commit install # Install pre-commit hooks
ollama pull llama3 # Pull default LLM model (optional)
Installation (Windows)
The tasks.py script provides a cross-platform alternative to the Makefile.
It requires only Python and uv — no make, bash, or Unix tools needed.
uv run python tasks.py install # Install all dependencies
uv run pre-commit install # Install pre-commit hooks
ollama pull llama3 # Pull default LLM model (optional)
uv run python tasks.py help # List all available targets
Note: Docker-based tests require Docker Desktop for Windows with the WSL 2 backend enabled.
Running Tests
# Linux / macOS
make robot # Run all Robot Framework test suites
make robot-math # Run math tests
make robot-docker # Run Docker tests
make robot-safety # Run safety tests
# All platforms (including Windows)
uv run python tasks.py robot # Run all suites
uv run python tasks.py robot-math # Run math tests
uv run python tasks.py robot-dryrun # Validate tests (dry run)
uv run python tasks.py check # Lint + typecheck + coverage
Superset Dashboard
# Linux / macOS
cp .env.example .env # Configure environment
make docker-up # Start PostgreSQL + Redis + Superset
make bootstrap # First-time Superset initialization
# Windows — tasks.py copies .env automatically if missing
uv run python tasks.py docker-up
Open http://localhost:8088 to view the dashboard.
Example Test
*** Test Cases ***
LLM Can Do Basic Math
${answer}= Ask LLM What is 2 + 2?
${score} ${reason}= Grade Answer What is 2 + 2? 4 ${answer}
Should Be Equal As Integers ${score} 1
Core Philosophy
- LLMs are software — test them like software
- Determinism before intelligence — structured, machine-verifiable evaluation first
- Constrained grading — scores, categories, pass/fail; no prose from the evaluation layer
- Modular by design — composable pieces; new providers and graders plug in without rewriting core
- Robot Framework as the orchestration layer — readable, keyword-driven tests
- Every test run is archived — listeners always active, results flow to SQL
- CI-native, regression-focused — if it can't run unattended, it's not done
See ai/AGENTS.md for the full philosophy.
Documentation
| Document | Description |
|---|---|
| docs/TEST_DATABASE.md | Database schema and usage |
| docs/GITLAB_CI_SETUP.md | CI/CD setup guide |
| docs/GRAFANA_SUPERSET_SETUP.md | Superset visualization stack setup (Grafana deferred to v2+) |
| docs/SUPERSET_EXPORT_GUIDE.md | Superset dashboard export, import, and backup |
Contributing
- Read ai/DEV.md for the development workflow and TDD discipline
- Follow the code style guidelines in ai/AGENTS.md
- Add tests for new features (see ai/CLAUDE.md for grading tiers)
- Run
pre-commit run --all-filesbefore committing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file robotframework_chat-1.0.0.tar.gz.
File metadata
- Download URL: robotframework_chat-1.0.0.tar.gz
- Upload date:
- Size: 703.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdee97e8912a0fb76a9bed376ecbfd325448d4eaf93a24011335cb87e076a0ec
|
|
| MD5 |
c6ef506fdc5de69372c5b1c8e4ab60a9
|
|
| BLAKE2b-256 |
bebd7e7151affd31f3236b1ca811144f363ae220d70b1251f9efc58fadefebdf
|
Provenance
The following attestation bundles were made for robotframework_chat-1.0.0.tar.gz:
Publisher:
pypi-publish.yml on tkarcheski/robotframework-chat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
robotframework_chat-1.0.0.tar.gz -
Subject digest:
bdee97e8912a0fb76a9bed376ecbfd325448d4eaf93a24011335cb87e076a0ec - Sigstore transparency entry: 1006866523
- Sigstore integration time:
-
Permalink:
tkarcheski/robotframework-chat@a6ee65e06e08afba3bed6a04a009912deb86873d -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/tkarcheski
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@a6ee65e06e08afba3bed6a04a009912deb86873d -
Trigger Event:
push
-
Statement type:
File details
Details for the file robotframework_chat-1.0.0-py3-none-any.whl.
File metadata
- Download URL: robotframework_chat-1.0.0-py3-none-any.whl
- Upload date:
- Size: 57.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be5f697faa9e656126ea7162318dc0a0db104623ff11c9841a4cee246acd5833
|
|
| MD5 |
6d4f62c9ad55ae3124779d525b5992b2
|
|
| BLAKE2b-256 |
3647c363a902330ab8bd0ece9de586991459f44fec81ec1ba5ffdf26dc942809
|
Provenance
The following attestation bundles were made for robotframework_chat-1.0.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on tkarcheski/robotframework-chat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
robotframework_chat-1.0.0-py3-none-any.whl -
Subject digest:
be5f697faa9e656126ea7162318dc0a0db104623ff11c9841a4cee246acd5833 - Sigstore transparency entry: 1006866576
- Sigstore integration time:
-
Permalink:
tkarcheski/robotframework-chat@a6ee65e06e08afba3bed6a04a009912deb86873d -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/tkarcheski
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@a6ee65e06e08afba3bed6a04a009912deb86873d -
Trigger Event:
push
-
Statement type: