Skip to main content

OpenJudge: A Next-Generation Evaluation System for AI Model Assessment

Project description

Open-Judge Logo

Holistic Evaluation, Quality Rewards: Driving Application Excellence

Python 3.10+ PyPI Documentation

Documentation | Contributing | 中文

News

Evaluation and reward signals are the cornerstones of application excellence. Holistic evaluation enables the systematic analysis of shortcomings to drive rapid iteration, while high-quality rewards provide the essential foundation for advanced optimization and fine-tuning. Open-Judge unifies reward signals and evaluation metrics into one Grader interface—with pre-built graders, flexible customization, and seamless framework integration.

Key Features

  • Systematic & Quality-Assured Grader Library: Access N+ production-ready graders featuring a comprehensive taxonomy, rigorously validated for reliable performance.

    • Multi-Scenario Coverage: Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks via specialized graders.
    • Holistic Agent Evaluation: Beyond final outcomes, we assess the entire lifecycle—including trajectories and specific components (Memory, Reflection, Tool Use).
    • Quality Assurance: Built for reliability. Every grader comes with benchmark datasets and pytest integration for immediate quality validation.
  • Flexible Grader Building Methods: Choose the build method that fits your requirements:

    • Customization: Easily extend or modify pre-defined graders to fit your specific needs.
    • Data-Driven Rubrics: Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data.
    • Trainable Judge Models: For high-scale scenarios, train dedicated Judge models as Graders. We support SFT, Bradley-Terry models, and Reinforcement Learning workflows.
  • Easy Integration: Seamlessly connect with mainstream evaluation platforms (e.g., LangSmith, LangFuse) and training frameworks (e.g., VERL) using our comprehensive tutorials and flexible APIs.

Installation

pip install open_judge

More installation methods can be found in the here.

Quickstart

import asyncio
from open_judge.models import OpenAIChatModel
from open_judge.graders.common.relevance import RelevanceGrader


# step1 create model client
model = OpenAIChatModel(model="qwen3-32b")

# step2 choose and initialize proper grader
grader = RelevanceGrader(model=model)

# step3 Prepare data

data = {
    "query": "What is machine learning?",
    "response": "Machine learning is a subset of AI that enables computers to learn from data.",
}

# step 4 Evaluate using the data
result = await grader.aevaluate(**data)

print(f"Score: {result.score}")  # Score: 5
print(f"Reason: {result.reason}")

Complete Quickstart can be found in here.

Integrations

Integration Documentation
LangSmith LangSmith
LangFuse LangFuse
Arize Phoenix Arize Phoenix

Contributing

We welcome contributions from the community!

  1. Raise and comment on Issues.
  2. Open a PR - Whether you're fixing bugs, adding new features, improving documentation, or sharing ideas, your contributions help make Open-Judge better for everyone. See Contributing for more details.

Citation

If you use Open-Judge in your research, please cite:

@software{
title = {OpenJudge: XXXX},
author = {The Open-Judge Team},
url = {https://github.com/modelscope/Open-Judge},
month = {07},
year = {2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_openjudge-0.1.7.tar.gz (276.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_openjudge-0.1.7-py3-none-any.whl (433.8 kB view details)

Uploaded Python 3

File details

Details for the file py_openjudge-0.1.7.tar.gz.

File metadata

  • Download URL: py_openjudge-0.1.7.tar.gz
  • Upload date:
  • Size: 276.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for py_openjudge-0.1.7.tar.gz
Algorithm Hash digest
SHA256 958a19f2af630ede91e82d729b3ea20ad7ba088a7987532222974733d75af812
MD5 24fb79cb1b7116533a385a6c4404fe53
BLAKE2b-256 9a0c08e62db8b9a99e80223d1c0f061bbf9666a862cf7552f0fc95fd39b00be2

See more details on using hashes here.

File details

Details for the file py_openjudge-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for py_openjudge-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 42a8fa08a1ce68cace47bb3f4161eb573fefddfd8a795e2247790e6a5672b13a
MD5 ef9b536b15c78b8fc5a3abecab335b66
BLAKE2b-256 93e9dfd6889e022df6960d7c872b2300e0dc0104ae4cf7b1d1cfa98a7569bd0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page