Skip to main content

Run a prompt across multiple LLMs and compare outputs side by side in the terminal.

Project description

Assayer

Send a prompt to multiple language models in parallel and compare their outputs in the terminal. Useful for evaluating which model handles a given task better, measuring semantic similarity between responses, or running an LLM-as-judge evaluation - without leaving the shell.

Installation

pip install assayer

Similarity scoring requires the optional score extra:

pip install "assayer[score]"

Python 3.11 or newer is required.

Supported Providers

  • OpenAI: All GPT models.
  • Anthropic: Claude models (Opus 4.7, Sonnet 4.6, Haiku 4.5, and earlier).
  • Google Gemini: Gemini 2.x and 3.x models.
  • Ollama: Local models running on your machine.

Configuration

Assayer looks for API keys in environment variables or a configuration file at ~/.assayer/config.json.

Environment Variables

export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export GEMINI_API_KEY="your-key"

Configuration File

{
  "OPENAI_API_KEY": "sk-...",
  "ANTHROPIC_API_KEY": "sk-ant-...",
  "GEMINI_API_KEY": "..."
}

Use assayer models check to verify your configuration.

Quickstart

assayer run "Explain recursion in one sentence." --models gpt-4o,claude-haiku-4-5-20251001

Commands

run

assayer run "prompt" --models gpt-4o,claude-sonnet-4-5
assayer run --prompt-file prompt.txt --models gpt-4o,ollama/llama3.2
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --score
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --judge gpt-4o --judge-criteria "clarity,brevity"
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --output results.json
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --output results.csv
assayer run "prompt with {var}" --models gpt-4o --var key=value
Flag Description
--models Comma-separated model identifiers (required)
--prompt-file Path to a .txt file instead of an inline prompt
--var KEY=VALUE template variable, repeatable
--system System prompt applied to all models
--temperature Sampling temperature
--max-tokens Maximum output tokens
--score Show pairwise similarity matrix
--judge Model to use as judge
--judge-criteria Comma-separated criteria for the judge
--output Save results to .json or .csv
--timeout Per-model timeout in seconds (default: 30)

models

assayer models list               # list all supported model identifiers
assayer models check              # check which API keys are configured
assayer models check ollama       # check if Ollama is running and list local models

config

assayer config set OPENAI_API_KEY sk-...
assayer config show

Keys are saved to ~/.assayer/config.json. Environment variables take precedence.

Providers

OpenAI

export OPENAI_API_KEY=sk-...

Supported models: gpt-5.5, gpt-5.5-pro, gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano, gpt-5.2, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o3-mini, o4-mini

Anthropic

export ANTHROPIC_API_KEY=sk-ant-...

Supported models: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001, claude-opus-4-6, claude-sonnet-4-5, claude-opus-4-5

Google Gemini

export GEMINI_API_KEY=...

Supported models: gemini-3.1-pro-preview, gemini-3.1-flash-lite, gemini-3-flash-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.0-flash, gemini-2.0-flash-lite

Ollama (local)

No API key needed. Start Ollama and use the ollama/ prefix:

ollama serve
assayer run "prompt" --models ollama/llama4-scout,ollama/llama3.2,ollama/qwen3

Scoring

--score embeds all outputs using all-MiniLM-L6-v2 (runs locally, no API call) and displays a pairwise cosine similarity matrix. Values range from 0 (unrelated) to 1 (identical meaning).

LLM-as-judge

--judge <model> sends all outputs to the specified model and asks it to pick a winner. Use --judge-criteria to focus the evaluation:

assayer run "Write a sorting algorithm." \
  --models gpt-4o,claude-sonnet-4-5 \
  --judge gpt-4o \
  --judge-criteria "correctness,readability"

If the judge call fails, a warning is printed to stderr and the run continues normally.

Export

--output results.json saves full results as JSON. --output results.csv saves as CSV. The file format is determined by the extension.

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup instructions, code style, and the PR process.

License

MIT - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assayer-1.0.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assayer-1.0.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file assayer-1.0.0.tar.gz.

File metadata

  • Download URL: assayer-1.0.0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for assayer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a4a408df6a62f20c50b977881e56197fe35649098c62c07b0183b541d71438b6
MD5 8b08ce245c3a6afcd44ca8fb5168d7cc
BLAKE2b-256 fe5c9569a8446d6a705989f14c70ead5ed66111f3af75537cf646864ddf543b0

See more details on using hashes here.

File details

Details for the file assayer-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: assayer-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for assayer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47d2384dcf2f1d80bd9677c7defd44cca4806fa47d1e622bb2748f926a76213a
MD5 d060a308e250540d246a8ed65e814cea
BLAKE2b-256 3ed0f66f62c0b06b20d41488c047ac1caeacf862a13b7b76c9bf1c2a836140ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page