Robot Framework-based test harness for systematically testing LLMs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

robotframework-chat

A Robot Framework-based test harness for systematically testing Large Language Models (LLMs) using LLMs as both the system under test and as automated graders. Test results are archived to SQL and visualized in Apache Superset dashboards.

Quick Start

Prerequisites

Python 3.11+ and astral-uv for dependency management
Docker for containerized code execution, LLM testing, and the Superset stack
Ollama (optional) for local LLM testing

Installation (Linux / macOS)

make install                # Install all dependencies
pre-commit install          # Install pre-commit hooks
ollama pull llama3          # Pull default LLM model (optional)

Installation (Windows)

The tasks.py script provides a cross-platform alternative to the Makefile. It requires only Python and uv — no make, bash, or Unix tools needed.

uv run python tasks.py install      # Install all dependencies
uv run pre-commit install           # Install pre-commit hooks
ollama pull llama3                  # Pull default LLM model (optional)
uv run python tasks.py help         # List all available targets

Note: Docker-based tests require Docker Desktop for Windows with the WSL 2 backend enabled.

Running Tests

# Linux / macOS
make robot                  # Run all Robot Framework test suites
make robot-math             # Run math tests
make robot-docker           # Run Docker tests
make robot-safety           # Run safety tests

# All platforms (including Windows)
uv run python tasks.py robot        # Run all suites
uv run python tasks.py robot-math   # Run math tests
uv run python tasks.py robot-dryrun # Validate tests (dry run)
uv run python tasks.py check        # Lint + typecheck + coverage

Superset Dashboard

# Linux / macOS
cp .env.example .env        # Configure environment
make docker-up              # Start PostgreSQL + Redis + Superset
make bootstrap              # First-time Superset initialization

# Windows — tasks.py copies .env automatically if missing
uv run python tasks.py docker-up

Open http://localhost:8088 to view the dashboard.

Example Test

*** Test Cases ***
LLM Can Do Basic Math
    ${answer}=    Ask LLM    What is 2 + 2?
    ${score}    ${reason}=    Grade Answer    What is 2 + 2?    4    ${answer}
    Should Be Equal As Integers    ${score}    1

Core Philosophy

LLMs are software — test them like software
Determinism before intelligence — structured, machine-verifiable evaluation first
Constrained grading — scores, categories, pass/fail; no prose from the evaluation layer
Modular by design — composable pieces; new providers and graders plug in without rewriting core
Robot Framework as the orchestration layer — readable, keyword-driven tests
Every test run is archived — listeners always active, results flow to SQL
CI-native, regression-focused — if it can't run unattended, it's not done

See ai/AGENTS.md for the full philosophy.

Documentation

Document	Description
docs/TEST_DATABASE.md	Database schema and usage
docs/GITLAB_CI_SETUP.md	CI/CD setup guide
docs/GRAFANA_SUPERSET_SETUP.md	Superset visualization stack setup (Grafana deferred to v2+)
docs/SUPERSET_EXPORT_GUIDE.md	Superset dashboard export, import, and backup

Contributing

Read ai/DEV.md for the development workflow and TDD discipline
Follow the code style guidelines in ai/AGENTS.md
Add tests for new features (see ai/CLAUDE.md for grading tiers)
Run pre-commit run --all-files before committing

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

space.nomad

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.3

Mar 23, 2026

1.4.1

Mar 21, 2026

This version

1.0.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robotframework_chat-1.0.0.tar.gz (703.5 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

robotframework_chat-1.0.0-py3-none-any.whl (57.9 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file robotframework_chat-1.0.0.tar.gz.

File metadata

Download URL: robotframework_chat-1.0.0.tar.gz
Upload date: Mar 1, 2026
Size: 703.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for robotframework_chat-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bdee97e8912a0fb76a9bed376ecbfd325448d4eaf93a24011335cb87e076a0ec`
MD5	`c6ef506fdc5de69372c5b1c8e4ab60a9`
BLAKE2b-256	`bebd7e7151affd31f3236b1ca811144f363ae220d70b1251f9efc58fadefebdf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for robotframework_chat-1.0.0.tar.gz:

Publisher: pypi-publish.yml on tkarcheski/robotframework-chat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: robotframework_chat-1.0.0.tar.gz
- Subject digest: bdee97e8912a0fb76a9bed376ecbfd325448d4eaf93a24011335cb87e076a0ec
- Sigstore transparency entry: 1006866523
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: tkarcheski/robotframework-chat@a6ee65e06e08afba3bed6a04a009912deb86873d
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/tkarcheski
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@a6ee65e06e08afba3bed6a04a009912deb86873d
- Trigger Event: push

File details

Details for the file robotframework_chat-1.0.0-py3-none-any.whl.

File metadata

Download URL: robotframework_chat-1.0.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 57.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for robotframework_chat-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be5f697faa9e656126ea7162318dc0a0db104623ff11c9841a4cee246acd5833`
MD5	`6d4f62c9ad55ae3124779d525b5992b2`
BLAKE2b-256	`3647c363a902330ab8bd0ece9de586991459f44fec81ec1ba5ffdf26dc942809`

See more details on using hashes here.

Provenance

The following attestation bundles were made for robotframework_chat-1.0.0-py3-none-any.whl:

Publisher: pypi-publish.yml on tkarcheski/robotframework-chat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: robotframework_chat-1.0.0-py3-none-any.whl
- Subject digest: be5f697faa9e656126ea7162318dc0a0db104623ff11c9841a4cee246acd5833
- Sigstore transparency entry: 1006866576
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: tkarcheski/robotframework-chat@a6ee65e06e08afba3bed6a04a009912deb86873d
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/tkarcheski
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@a6ee65e06e08afba3bed6a04a009912deb86873d
- Trigger Event: push

robotframework-chat 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

robotframework-chat

Quick Start

Prerequisites

Installation (Linux / macOS)

Installation (Windows)

Running Tests

Superset Dashboard

Example Test

Core Philosophy

Documentation

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance