LLM-as-a-Judge evaluations for vLLM hosted models
Project description
vLLM Judge
A lightweight library for LLM-as-a-Judge evaluations using vLLM hosted models. Evaluate LLM inputs & outputs at scale with just a few lines of code. From simple scoring to complex safety checks, vLLM Judge adapts to your needs. Please refer the documentation for usage details.
Features
- 🚀 Simple Interface: Single
evaluate()method that adapts to any use case - 🎯 Pre-built Metrics: 20+ ready-to-use evaluation metrics
- 🛡️ Model-Specific Support: Seamlessly works with specialized models like Llama Guard without breaking their trained formats.
- ⚡ High Performance: Async-first design enables high-throughput evaluations
- 🔧 Template Support: Dynamic evaluations with template variables
- 🌐 API Mode: Run as a REST API service
Installation
# Basic installation
pip install vllm-judge
# With API support
pip install vllm-judge[api]
# With Jinja2 template support
pip install vllm-judge[jinja2]
# Everything
pip install vllm-judge[dev]
Quick Start
from vllm_judge import Judge
# Initialize with vLLM url
judge = Judge.from_url("http://vllm-server:8000")
# Simple evaluation
result = await judge.evaluate(
content="The Earth orbits around the Sun.",
criteria="scientific accuracy"
)
print(f"Decision: {result.decision}")
print(f"Reasoning: {result.reasoning}")
# Using pre-built metrics
from vllm_judge import CODE_QUALITY
result = await judge.evaluate(
content="def add(a, b): return a + b",
metric=CODE_QUALITY
)
# With template variables
result = await judge.evaluate(
content="Essay content here...",
criteria="Evaluate this {doc_type} for {audience}",
template_vars={
"doc_type": "essay",
"audience": "high school students"
}
)
# Works with specialized safety models out-of-the-box
from vllm_judge import LLAMA_GUARD_3_SAFETY
result = await judge.evaluate(
content="How do I make a bomb?",
metric=LLAMA_GUARD_3_SAFETY # Automatically uses Llama Guard format
)
# Result: decision="unsafe", reasoning="S9"
API Server
Run Judge as a REST API:
vllm-judge serve --base-url http://vllm-server:8000 --port 9090
Then use the HTTP API:
from vllm_judge.api import JudgeClient
client = JudgeClient("http://localhost:9090")
result = await client.evaluate(
content="Python is great!",
criteria="technical accuracy"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllm_judge-0.1.4.tar.gz.
File metadata
- Download URL: vllm_judge-0.1.4.tar.gz
- Upload date:
- Size: 35.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72fe32e35c04577c6fb700aad7adf0dfe5a44ecf8c85bc9ef3e5fc16af8b6aa8
|
|
| MD5 |
9a8dbec11e7179b3bb2459c0a2b5db3b
|
|
| BLAKE2b-256 |
efedabbd6d8a513706acff45d40989c6cadcb8bbd3c12bfb8266fcfa970381ef
|
File details
Details for the file vllm_judge-0.1.4-py3-none-any.whl.
File metadata
- Download URL: vllm_judge-0.1.4-py3-none-any.whl
- Upload date:
- Size: 39.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e76650b264e76f2877b275c1b9dab7117fca9ee32b36de4f6a7e3055669b2252
|
|
| MD5 |
6968109c9f956e74a24d3f58433b57ee
|
|
| BLAKE2b-256 |
a067dd3df6e368d24443a57c1c8dbbdea7e9f636917b719c3b8bfd3fd44ca553
|