Skip to main content

Litmus — LLM scenario runner TUI

Project description

Litmus 🧪

Terminal UI for running LLM agent scenarios and comparing their performance.

Litmus executes coding tasks across multiple AI agents and models, runs tests against the results, and produces detailed evaluation reports — all from a single TUI.

What it does

  1. Detects agents installed on your system (Claude Code, Codex, Aider, Cursor Agent, KiloCode, OpenCode)
  2. Runs scenarios — each scenario is a coding task with tests and scoring criteria
  3. Evaluates results — an LLM judge scores agent and model performance across 20 criteria each
  4. Generates reports — HTML reports with per-scenario breakdowns, logs, and scores

Supported agents

Agent Binary Model listing
Claude Code claude Built-in list
Codex codex Built-in list
OpenCode opencode opencode models
KiloCode kilocode kilocode models
Aider aider aider --list-models
Cursor Agent agent agent models

Litmus auto-detects which agents are available and queries their model lists.

Quick start

Requires Python 3.12+ and uv.

# Run without installing — uv fetches everything automatically
uvx --from git+https://github.com/ivkond/litmus.git litmus

On first launch Litmus will detect installed agents, generate a config, and open the TUI.

Alternative ways to install

# Install as a global tool
uv tool install git+https://github.com/ivkond/litmus.git
litmus

# Or clone for development
git clone https://github.com/ivkond/litmus.git
cd litmus
uv sync
uv run litmus

Once published to PyPI, install will simplify to uvx litmus.

TUI workflow

  1. 📋 Models — select agents and models to test
  2. 🧩 Scenarios — pick which coding tasks to run
  3. ▶️ Run — watch execution progress in real time
  4. 📊 Analysis — review LLM-judged scores
  5. 📄 Reports — browse generated HTML reports

How it works

Each scenario lives in template/<id>/ and contains:

template/1-data-structure/
  prompt.txt        # Task description sent to the agent
  task.txt          # Detailed requirements
  scoring.csv       # Evaluation criteria
  project/          # Starter code with tests

Execution pipeline per scenario:

uv sync  ->  agent call  ->  pytest  ->  collect logs

After all runs complete, an LLM judge evaluates the results using 20 agent criteria (tool efficiency, error recovery, reasoning depth...) and 20 model criteria (code correctness, instruction following, hallucination resistance...).

Configuration

On first launch, Litmus generates a config file with detected agents and their settings. Configure the analysis model (any OpenAI-compatible API) through the TUI settings screen.

Scenario packs

Litmus supports exporting and importing scenario archives (.litmus-pack ZIP files) for sharing test suites between machines or teams.

Project structure

src/litmus/
  __init__.py     # Entry point, PROJECT_ROOT
  app.py          # Textual TUI (screens, widgets)
  agents.py       # Agent registry, detection, model listing
  run.py          # Scenario execution engine
  analysis.py     # LLM-powered evaluation (20+20 criteria)
  report.py       # HTML report generation
  pack/           # Scenario export/import

Tech stack

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litmus_llm-0.2.0.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litmus_llm-0.2.0-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file litmus_llm-0.2.0.tar.gz.

File metadata

  • Download URL: litmus_llm-0.2.0.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 991d2b12724bd824814437f317db82da849d253cec12616405a677744e8beb7a
MD5 8bf7ee85d5ce1598e27284939a3f10a2
BLAKE2b-256 19d6f88bebafc3e89a87b5f84ebf9d3d8b3f7177dcc7d54965c2d04cb7a11d84

See more details on using hashes here.

File details

Details for the file litmus_llm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: litmus_llm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 52.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b5f0a9a0636688de7e0ab62b021f68ec76d03b5f5123b723add32b82974d53b
MD5 db320c567ea29b4c5c4422729ec399e7
BLAKE2b-256 953e4dc8bee6a7cf1dddf5ac3708b21ee823c0cf29d37c15e27bfe3b2e5b9a38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page