Run a prompt across multiple LLMs and compare outputs side by side in the terminal.

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

assayer

Send a prompt to multiple language models in parallel and compare their outputs in the terminal. Useful for evaluating which model handles a given task better, measuring semantic similarity between responses, or running an LLM-as-judge evaluation — without leaving the shell.

Installation

pip install assayer

Similarity scoring requires the optional score extra:

pip install "assayer[score]"

Python 3.11 or newer is required.

Contributing? See CONTRIBUTING.md for setup, code style, and PR guidelines.

Supported Providers

OpenAI: All GPT models.
Anthropic: Claude 4.5 models (Opus, Sonnet, Haiku).
Google Gemini: 1.5 Pro and Flash models.
Ollama: Local models running on your machine.

Configuration

Assayer looks for API keys in environment variables or a configuration file at ~/.assayer/config.json.

Environment Variables

export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export GEMINI_API_KEY="your-key"

Configuration File

{
  "OPENAI_API_KEY": "sk-...",
  "ANTHROPIC_API_KEY": "sk-ant-...",
  "GEMINI_API_KEY": "..."
}

Use assayer models check to verify your configuration.

Quickstart

assayer run "Explain recursion in one sentence." --models gpt-4o,claude-haiku-4-5-20251001

Commands

run

assayer run "prompt" --models gpt-4o,claude-sonnet-4-5
assayer run --prompt-file prompt.txt --models gpt-4o,ollama/llama3
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --score
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --judge gpt-4o --judge-criteria "clarity,brevity"
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --output results.json
assayer run "prompt" --models gpt-4o,claude-sonnet-4-5 --output results.csv
assayer run "prompt with {var}" --models gpt-4o --var key=value

Flag	Description
`--models`	Comma-separated model identifiers (required)
`--prompt-file`	Path to a `.txt` file instead of an inline prompt
`--var`	`KEY=VALUE` template variable, repeatable
`--system`	System prompt applied to all models
`--temperature`	Sampling temperature
`--max-tokens`	Maximum output tokens
`--score`	Show pairwise similarity matrix
`--judge`	Model to use as judge
`--judge-criteria`	Comma-separated criteria for the judge
`--output`	Save results to `.json` or `.csv`

models

assayer models list               # list all supported model identifiers
assayer models check              # check which API keys are configured
assayer models check ollama       # check if Ollama is running and list local models

config

assayer config set OPENAI_API_KEY sk-...
assayer config show

Keys are saved to ~/.assayer/config.json. Environment variables take precedence.

Providers

OpenAI

export OPENAI_API_KEY=sk-...

Supported models: gpt-5.5, gpt-5.5-pro, gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano, gpt-5.2, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o3-mini, o4-mini

Anthropic

export ANTHROPIC_API_KEY=sk-ant-...

Supported models: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001, claude-opus-4-6, claude-sonnet-4-5, claude-opus-4-5

Google Gemini

export GEMINI_API_KEY=...

Supported models: gemini-3.1-pro-preview, gemini-3.1-flash-lite, gemini-3-flash-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.0-flash, gemini-2.0-flash-lite

Ollama (local)

No API key needed. Start Ollama and use the ollama/ prefix:

ollama serve
assayer run "prompt" --models ollama/llama4-scout,ollama/llama3.2,ollama/qwen3

Scoring

--score embeds all outputs using all-MiniLM-L6-v2 (runs locally, no API call) and displays a pairwise cosine similarity matrix. Values range from 0 (unrelated) to 1 (identical meaning).

LLM-as-judge

--judge <model> sends all outputs to the specified model and asks it to pick a winner. Use --judge-criteria to focus the evaluation:

assayer run "Write a sorting algorithm." \
  --models gpt-4o,claude-sonnet-4-5 \
  --judge gpt-4o \
  --judge-criteria "correctness,readability"

If the judge call fails, a warning is printed to stderr and the run continues normally.

Export

--output results.json saves full results as JSON. --output results.csv saves as CSV. The file format is determined by the extension.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

1.0.0

May 18, 2026

This version

0.1.1

May 18, 2026

0.1.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assayer-0.1.1.tar.gz (17.3 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

assayer-0.1.1-py3-none-any.whl (16.5 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file assayer-0.1.1.tar.gz.

File metadata

Download URL: assayer-0.1.1.tar.gz
Upload date: May 18, 2026
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for assayer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6a3b76a39a0a0a7be8e16efce589b1259b2786c2fd1d9b3540f16df24f24da13`
MD5	`0db8e836201bff370a6bae4d3a839f7e`
BLAKE2b-256	`7a3e85f7dc91a8fb27bdbce1bf4ccd0f95d9486b99ea6fa28050219ea5f31db3`

See more details on using hashes here.

File details

Details for the file assayer-0.1.1-py3-none-any.whl.

File metadata

Download URL: assayer-0.1.1-py3-none-any.whl
Upload date: May 18, 2026
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for assayer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`464c8a619304b2707624932cbdb996df74c297eac8090d360be9b8c23e326c36`
MD5	`7e5a35ecafde5bb5148e721298f060bc`
BLAKE2b-256	`d3fa33aa0c99a108c73a555053a045e0e6729257ba57856e22a9253db96d6162`

See more details on using hashes here.

assayer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

assayer

Installation

Supported Providers

Configuration

Environment Variables

Configuration File

Quickstart

Commands

run

models

config

Providers

OpenAI

Anthropic

Google Gemini

Ollama (local)

Scoring

LLM-as-judge

Export

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes