OpenJudge: A Next-Generation Evaluation System for AI Model Assessment
Project description
Holistic Evaluation, Quality Rewards: Driving Application Excellence
News
- 2025-10-20 - Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling - We released a new paper on learning generalizable reward criteria for robust modeling.
- 2025-10-17 - Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning - We introduced techniques to align judge feedback and improve RL stability.
- 2025-07-09 - Released OpenJudge v0.1.0 on PyPI
Evaluation and reward signals are the cornerstones of application excellence. Holistic evaluation enables the systematic analysis of shortcomings to drive rapid iteration, while high-quality rewards provide the essential foundation for advanced optimization and fine-tuning. Open-Judge unifies reward signals and evaluation metrics into one Grader interface—with pre-built graders, flexible customization, and seamless framework integration.
Key Features
-
Systematic & Quality-Assured Grader Library: Access N+ production-ready graders featuring a comprehensive taxonomy, rigorously validated for reliable performance.
- Multi-Scenario Coverage: Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks via specialized graders.
- Holistic Agent Evaluation: Beyond final outcomes, we assess the entire lifecycle—including trajectories and specific components (Memory, Reflection, Tool Use).
- Quality Assurance: Built for reliability. Every grader comes with benchmark datasets and pytest integration for immediate quality validation.
-
Flexible Grader Building Methods: Choose the build method that fits your requirements:
- Customization: Easily extend or modify pre-defined graders to fit your specific needs.
- Data-Driven Rubrics: Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data.
- Trainable Judge Models: For high-scale scenarios, train dedicated Judge models as Graders. We support SFT, Bradley-Terry models, and Reinforcement Learning workflows.
-
Easy Integration: Seamlessly connect with mainstream evaluation platforms (e.g., LangSmith, LangFuse) and training frameworks (e.g., VERL) using our comprehensive tutorials and flexible APIs.
Installation
pip install open_judge
More installation methods can be found in the here.
Quickstart
import asyncio
from open_judge.models import OpenAIChatModel
from open_judge.graders.common.relevance import RelevanceGrader
# step1 create model client
model = OpenAIChatModel(model="qwen3-32b")
# step2 choose and initialize proper grader
grader = RelevanceGrader(model=model)
# step3 Prepare data
data = {
"query": "What is machine learning?",
"response": "Machine learning is a subset of AI that enables computers to learn from data.",
}
# step 4 Evaluate using the data
result = await grader.aevaluate(**data)
print(f"Score: {result.score}") # Score: 5
print(f"Reason: {result.reason}")
Complete Quickstart can be found in here.
Integrations
| Integration | Documentation |
|---|---|
| LangSmith | LangSmith |
| LangFuse | LangFuse |
| Arize Phoenix | Arize Phoenix |
Contributing
We welcome contributions from the community!
- Raise and comment on Issues.
- Open a PR - Whether you're fixing bugs, adding new features, improving documentation, or sharing ideas, your contributions help make Open-Judge better for everyone. See Contributing for more details.
Citation
If you use Open-Judge in your research, please cite:
@software{
title = {OpenJudge: XXXX},
author = {The Open-Judge Team},
url = {https://github.com/modelscope/Open-Judge},
month = {07},
year = {2025}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_openjudge-0.1.7.tar.gz.
File metadata
- Download URL: py_openjudge-0.1.7.tar.gz
- Upload date:
- Size: 276.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
958a19f2af630ede91e82d729b3ea20ad7ba088a7987532222974733d75af812
|
|
| MD5 |
24fb79cb1b7116533a385a6c4404fe53
|
|
| BLAKE2b-256 |
9a0c08e62db8b9a99e80223d1c0f061bbf9666a862cf7552f0fc95fd39b00be2
|
File details
Details for the file py_openjudge-0.1.7-py3-none-any.whl.
File metadata
- Download URL: py_openjudge-0.1.7-py3-none-any.whl
- Upload date:
- Size: 433.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42a8fa08a1ce68cace47bb3f4161eb573fefddfd8a795e2247790e6a5672b13a
|
|
| MD5 |
ef9b536b15c78b8fc5a3abecab335b66
|
|
| BLAKE2b-256 |
93e9dfd6889e022df6960d7c872b2300e0dc0104ae4cf7b1d1cfa98a7569bd0a
|