OpenJudge: A Unified Framework for Holistic Evaluation and Quality Reward
Project description
Holistic Evaluation, Quality Rewards: Driving Application Excellence
🌟 If you find OpenJudge helpful, please give us a Star! 🌟
📑 Table of Contents
OpenJudge is a unified framework designed to drive application excellence through Holistic Evaluation and Quality Rewards.
💡 Evaluation and reward signals are the cornerstones of application excellence. Holistic evaluation enables the systematic analysis of shortcomings to drive rapid iteration, while high-quality rewards provide the essential foundation for advanced optimization and fine-tuning.
OpenJudge unifies evaluation metrics and reward signals into a single, standardized Grader interface, offering pre-built graders, flexible customization, and seamless framework integration.
✨ Key Features
📦 Systematic & Quality-Assured Grader Library
Access 50+ production-ready graders featuring a comprehensive taxonomy, rigorously validated for reliable performance.
🎯 GeneralFocus: Semantic quality, functional correctness, structural compliance Key Graders:
|
🤖 AgentFocus: Agent lifecycle, tool calling, memory, plan feasibility, trajectory quality Key Graders:
|
🖼️ MultimodalFocus: Image-text coherence, visual generation quality, image helpfulness Key Graders:
|
- 🌐 Multi-Scenario Coverage: Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks. → Explore Supported Scenarios
- 🔄 Holistic Agent Evaluation: Beyond final outcomes, we assess the entire lifecycle—including trajectories, Memory, Reflection, and Tool Use. → Agent Lifecycle Evaluation
- ✅ Quality Assurance: Every grader comes with benchmark datasets and pytest integration for validation. → View Benchmark Datasets
🛠️ Flexible Grader Building Methods
Choose the build method that fits your requirements:
- Customization: Easily extend or modify pre-defined graders to fit your specific needs. 👉 Custom Grader Development Guide
- Data-Driven Rubrics: Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data.👉 Automatic Rubric Generation Tutorial
- Training Judge Models ( Coming Soon🚀): For high-scale and specialized scenarios, we are developing the capability to train dedicated Judge models. Support for SFT, Bradley-Terry models, and Reinforcement Learning workflows is on the way to help you build high-performance, domain-specific graders.
🔌 Easy Integration (🚧 Coming Soon)
We're actively building seamless connectors for mainstream observability platforms and training frameworks. Stay tuned! → See Integrations
News
-
2025-12-26 - Released OpenJudge v0.2.0 on PyPI - Major Update! This release expands our core capabilities by adding robust support for diverse evaluation scenarios on top of reward construction. By unifying reward and evaluation signals, OpenJudge v0.2.0 provides a more holistic approach to optimizing application performance and excellence. → For v0.1.x Users
-
2025-10-20 - Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling - We released a new paper on learning generalizable reward criteria for robust modeling.
-
2025-10-17 - Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning - We introduced techniques to align judge feedback and improve RL stability.
-
2025-07-09 - Released OpenJudge v0.1.0 on PyPI
📥 Installation
pip install py-openjudge
💡 More installation methods can be found in the Quickstart Guide.
🚀 Quickstart
import asyncio
from openjudge.models import OpenAIChatModel
from openjudge.graders.common.relevance import RelevanceGrader
async def main():
# 1️⃣ Create model client
model = OpenAIChatModel(model="qwen3-32b")
# 2️⃣ Initialize grader
grader = RelevanceGrader(model=model)
# 3️⃣ Prepare data
data = {
"query": "What is machine learning?",
"response": "Machine learning is a subset of AI that enables computers to learn from data.",
}
# 4️⃣ Evaluate
result = await grader.aevaluate(**data)
print(f"Score: {result.score}") # Score: 5
print(f"Reason: {result.reason}")
if __name__ == "__main__":
asyncio.run(main())
📚 Complete Quickstart can be found in the Quickstart Guide.
🔗 Integrations
Seamlessly connect OpenJudge with mainstream observability and training platforms, with more integrations on the way:
| Category | Status | Platforms |
|---|---|---|
| Observability | 🟡 In Progress | LangSmith, LangFuse, Arize Phoenix |
| Training | 🔵 Planned | verl, Trinity-RFT |
💬 Have a framework you'd like us to prioritize? Open an Issue!
🤝 Contributing
We love your input! We want to make contributing to OpenJudge as easy and transparent as possible.
🎨 Adding New Graders — Have domain-specific evaluation logic? Share it with the community!
🐛 Reporting Bugs — Found a glitch? Help us fix it by opening an issue
📝 Improving Docs — Clearer explanations or better examples are always welcome
💡 Proposing Features — Have ideas for new integrations? Let's discuss!
📖 See full Contributing Guidelines for coding standards and PR process.
📦 For v0.1.x Users
Package renamed from
rm-gallery→py-openjudge. Legacy version still available viapip install rm-gallery. Source code preserved inv0.1.6branch.
📄 Citation
If you use OpenJudge in your research, please cite:
@software{
title = {OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards},
author = {The OpenJudge Team},
url = {https://github.com/modelscope/OpenJudge},
month = {07},
year = {2025}
}
Made with ❤️ by the OpenJudge Team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_openjudge-0.1.8.tar.gz.
File metadata
- Download URL: py_openjudge-0.1.8.tar.gz
- Upload date:
- Size: 283.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48b1bb8359018270a13cccf17bac398f85e870954803292fb6559cbbe5cf1079
|
|
| MD5 |
d867a70128d02bd4d67472d36c27a391
|
|
| BLAKE2b-256 |
016531c54ce89fc56cab095bf85826c1d94a3c1685f1df2103a11e9de8fa9abe
|
File details
Details for the file py_openjudge-0.1.8-py3-none-any.whl.
File metadata
- Download URL: py_openjudge-0.1.8-py3-none-any.whl
- Upload date:
- Size: 439.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
496b8745bb637889b596343c79048b8d9b5a460d88f99c201e91e5eb4ee6b2e5
|
|
| MD5 |
ab32e4fb95976f571de54aa782a202e4
|
|
| BLAKE2b-256 |
a3b73586d113af3c052d6684c73730c70f098270ec1c63e225bbef99af749268
|