Skip to main content

Litmus — LLM scenario runner TUI

Project description

Litmus 🧪

Terminal UI for running LLM agent scenarios and comparing their performance.

Litmus executes coding tasks across multiple AI agents and models, runs tests against the results, and produces detailed evaluation reports — all from a single TUI.

What it does

  1. Detects agents installed on your system (Claude Code, Codex, Aider, Cursor Agent, KiloCode, OpenCode)
  2. Runs scenarios — each scenario is a coding task with tests and scoring criteria
  3. Evaluates results — an LLM judge scores agent and model performance across 20 criteria each
  4. Generates reports — HTML reports with per-scenario breakdowns, logs, and scores

Supported agents

Agent Binary Model listing
Claude Code claude Built-in list
Codex codex Built-in list
OpenCode opencode opencode models
KiloCode kilocode kilocode models
Aider aider aider --list-models
Cursor Agent agent agent models

Litmus auto-detects which agents are available and queries their model lists.

Quick start

Requires Python 3.12+ and uv.

# Run without installing — uv fetches everything automatically
uvx --from git+https://github.com/ivkond/litmus.git litmus

On first launch Litmus will detect installed agents, generate a config, and open the TUI.

Alternative ways to install

# Install as a global tool
uv tool install git+https://github.com/ivkond/litmus.git
litmus

# Or clone for development
git clone https://github.com/ivkond/litmus.git
cd litmus
uv sync
uv run litmus

Once published to PyPI, install will simplify to uvx litmus.

TUI workflow

  1. 📋 Models — select agents and models to test
  2. 🧩 Scenarios — pick which coding tasks to run
  3. ▶️ Run — watch execution progress in real time
  4. 📊 Analysis — review LLM-judged scores
  5. 📄 Reports — browse generated HTML reports

How it works

Each scenario lives in template/<id>/ and contains:

template/1-data-structure/
  prompt.txt        # Task description sent to the agent
  task.txt          # Detailed requirements
  scoring.csv       # Evaluation criteria
  project/          # Starter code with tests

Execution pipeline per scenario:

uv sync  ->  agent call  ->  pytest  ->  collect logs

After all runs complete, an LLM judge evaluates the results using 20 agent criteria (tool efficiency, error recovery, reasoning depth...) and 20 model criteria (code correctness, instruction following, hallucination resistance...).

Configuration

On first launch, Litmus generates a config file with detected agents and their settings. Configure the analysis model (any OpenAI-compatible API) through the TUI settings screen.

Scenario packs

Litmus supports exporting and importing scenario archives (.litmus-pack ZIP files) for sharing test suites between machines or teams.

Project structure

src/litmus/
  __init__.py     # Entry point, PROJECT_ROOT
  app.py          # Textual TUI (screens, widgets)
  agents.py       # Agent registry, detection, model listing
  run.py          # Scenario execution engine
  analysis.py     # LLM-powered evaluation (20+20 criteria)
  report.py       # HTML report generation
  pack/           # Scenario export/import

Tech stack

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litmus_llm-0.2.1.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litmus_llm-0.2.1-py3-none-any.whl (53.0 kB view details)

Uploaded Python 3

File details

Details for the file litmus_llm-0.2.1.tar.gz.

File metadata

  • Download URL: litmus_llm-0.2.1.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 30af2c3ba6fc52ab4b312db70b98f040031d89f07627bc787257ce3fa13a4c15
MD5 ade7f2c49c6fcd5c2dc5016e72824941
BLAKE2b-256 a2b7ff0fdee8f8ca651a4d681e6d65c46fcfb5150f144795e2bbf7196891f727

See more details on using hashes here.

File details

Details for the file litmus_llm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: litmus_llm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 53.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b2959428a2d7099b7a3368f6abe80b97bec31bbf4db8c21a4ec62f16c5659a2a
MD5 195d6f1971674d9a6bf238f3c614497e
BLAKE2b-256 613ef936236df503297d70a174c1b66c3481f86f9641f0185e3b1c1afa5121f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page