Skip to main content

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Reward

Project description

Open-Judge Logo

Holistic Evaluation, Quality Rewards: Driving Application Excellence

🌟 If you find OpenJudge helpful, please give us a Star! 🌟

Python 3.10+ PyPI Documentation

📖 Documentation | 🤝 Contributing | 中文


📑 Table of Contents

OpenJudge is a unified framework designed to drive application excellence through Holistic Evaluation and Quality Rewards.

💡 Evaluation and reward signals are the cornerstones of application excellence. Holistic evaluation enables the systematic analysis of shortcomings to drive rapid iteration, while high-quality rewards provide the essential foundation for advanced optimization and fine-tuning.

OpenJudge unifies evaluation metrics and reward signals into a single, standardized Grader interface, offering pre-built graders, flexible customization, and seamless framework integration.


✨ Key Features

📦 Systematic & Quality-Assured Grader Library

Access 50+ production-ready graders featuring a comprehensive taxonomy, rigorously validated for reliable performance.

🎯 General

Focus: Semantic quality, functional correctness, structural compliance

Key Graders:

  • Relevance - Semantic relevance scoring
  • Similarity - Text similarity measurement
  • Syntax Check - Code syntax validation
  • JSON Match - Structure compliance

🤖 Agent

Focus: Agent lifecycle, tool calling, memory, plan feasibility, trajectory quality

Key Graders:

  • Tool Selection - Tool choice accuracy
  • Memory - Context preservation
  • Plan - Strategy feasibility
  • Trajectory - Path optimization

🖼️ Multimodal

Focus: Image-text coherence, visual generation quality, image helpfulness

Key Graders:

  • Image Coherence - Visual-text alignment
  • Text-to-Image - Generation quality
  • Image Helpfulness - Image contribution
  • 🌐 Multi-Scenario Coverage: Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks. → Explore Supported Scenarios
  • 🔄 Holistic Agent Evaluation: Beyond final outcomes, we assess the entire lifecycle—including trajectories, Memory, Reflection, and Tool Use. → Agent Lifecycle Evaluation
  • Quality Assurance: Every grader comes with benchmark datasets and pytest integration for validation. → View Benchmark Datasets

🛠️ Flexible Grader Building Methods

Choose the build method that fits your requirements:

  • Customization: Easily extend or modify pre-defined graders to fit your specific needs. 👉 Custom Grader Development Guide
  • Data-Driven Rubrics: Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data.👉 Automatic Rubric Generation Tutorial
  • Training Judge Models ( Coming Soon🚀): For high-scale and specialized scenarios, we are developing the capability to train dedicated Judge models. Support for SFT, Bradley-Terry models, and Reinforcement Learning workflows is on the way to help you build high-performance, domain-specific graders.

🔌 Easy Integration (🚧 Coming Soon)

We're actively building seamless connectors for mainstream observability platforms and training frameworks. Stay tuned! → See Integrations


News


📥 Installation

pip install py-openjudge

💡 More installation methods can be found in the Quickstart Guide.


🚀 Quickstart

import asyncio
from openjudge.models import OpenAIChatModel
from openjudge.graders.common.relevance import RelevanceGrader

async def main():
    # 1️⃣ Create model client
    model = OpenAIChatModel(model="qwen3-32b")

    # 2️⃣ Initialize grader
    grader = RelevanceGrader(model=model)

    # 3️⃣ Prepare data
    data = {
        "query": "What is machine learning?",
        "response": "Machine learning is a subset of AI that enables computers to learn from data.",
    }

    # 4️⃣ Evaluate
    result = await grader.aevaluate(**data)

    print(f"Score: {result.score}")   # Score: 5
    print(f"Reason: {result.reason}")

if __name__ == "__main__":
    asyncio.run(main())

📚 Complete Quickstart can be found in the Quickstart Guide.


🔗 Integrations

Seamlessly connect OpenJudge with mainstream observability and training platforms, with more integrations on the way:

Category Status Platforms
Observability 🟡 In Progress LangSmith, LangFuse, Arize Phoenix
Training 🔵 Planned verl, Trinity-RFT

💬 Have a framework you'd like us to prioritize? Open an Issue!


🤝 Contributing

We love your input! We want to make contributing to OpenJudge as easy and transparent as possible.

🎨 Adding New Graders — Have domain-specific evaluation logic? Share it with the community!
🐛 Reporting Bugs — Found a glitch? Help us fix it by opening an issue
📝 Improving Docs — Clearer explanations or better examples are always welcome
💡 Proposing Features — Have ideas for new integrations? Let's discuss!

📖 See full Contributing Guidelines for coding standards and PR process.


📦 For v0.1.x Users

Package renamed from rm-gallerypy-openjudge. Legacy version still available via pip install rm-gallery. Source code preserved in v0.1.6 branch.


📄 Citation

If you use OpenJudge in your research, please cite:

@software{
  title  = {OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards},
  author = {The OpenJudge Team},
  url    = {https://github.com/modelscope/OpenJudge},
  month  = {07},
  year   = {2025}
}

Made with ❤️ by the OpenJudge Team

⭐ Star Us · 🐛 Report Bug · 💡 Request Feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_openjudge-0.1.8.tar.gz (283.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_openjudge-0.1.8-py3-none-any.whl (439.0 kB view details)

Uploaded Python 3

File details

Details for the file py_openjudge-0.1.8.tar.gz.

File metadata

  • Download URL: py_openjudge-0.1.8.tar.gz
  • Upload date:
  • Size: 283.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for py_openjudge-0.1.8.tar.gz
Algorithm Hash digest
SHA256 48b1bb8359018270a13cccf17bac398f85e870954803292fb6559cbbe5cf1079
MD5 d867a70128d02bd4d67472d36c27a391
BLAKE2b-256 016531c54ce89fc56cab095bf85826c1d94a3c1685f1df2103a11e9de8fa9abe

See more details on using hashes here.

File details

Details for the file py_openjudge-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for py_openjudge-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 496b8745bb637889b596343c79048b8d9b5a460d88f99c201e91e5eb4ee6b2e5
MD5 ab32e4fb95976f571de54aa782a202e4
BLAKE2b-256 a3b73586d113af3c052d6684c73730c70f098270ec1c63e225bbef99af749268

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page