A unified hub for reward models in AI alignment

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

RewardHub

RewardHub is an end-to-end library for annotating data using state-of-the-art (SoTA) reward models, critic functions, and related processes. It is designed to facilitate the generation of preference training data or define acceptance criteria for agentic or inference scaling systems such as Best-of-N sampling or Beam-Search.

Getting Started

Installation

Basic Installation

For all functionality including HuggingFace, VLLM, and OpenAI backends:

git clone https://github.com/Red-Hat-AI-Innovation-Team/reward_hub.git
cd reward_hub
pip install -e .

PRM Installation (Qwen-PRM Support)

If you need to use Qwen Process Reward Models (e.g., Qwen/Qwen2.5-Math-PRM-7B), install with the prm extra:

pip install -e .[prm]

Development Installation

For development with additional tools (pytest, ruff, pre-commit):

pip install -e .[dev]

Usage Examples

RewardHub supports multiple types of reward models and serving methods. Here are the main ways to use the library:

Process Reward Models (PRM)

PRMs evaluate responses by analyzing the reasoning process:

from reward_hub import AutoRM

# Load a math-focused PRM using HuggingFace backend
model = AutoRM.load("Qwen/Qwen2.5-Math-PRM-7B", load_method="hf")

# Example conversation
messages = [
    [
        {"role": "user", "content": "What is 2+2?"},
        {"role": "assistant", "content": "Let me solve this step by step:\n1) 2 + 2 = 4\nTherefore, 4"}
    ]
]

# Get scores with full PRM results
results = model.score(messages, return_full_prm_result=True)
# Or just get the scores
scores = model.score(messages, return_full_prm_result=False)

Outcome Reward Models (ORM)

ORMs focus on evaluating the final response quality:

from reward_hub import AutoRM

# Load an ORM using HuggingFace backend
model = AutoRM.load("internlm/internlm2-7b-reward", load_method="hf")

scores = model.score([
    [
        {"role": "user", "content": "What is 2+2?"},
        {"role": "assistant", "content": "The answer is 4."}
    ]
])

DrSow Reward Model

DrSow uses density ratios between strong and weak models to evaluate responses:

Launch the strong and weak models first.

bash scripts/launch_drsow.sh Qwen/Qwen2.5-32B-instruct Qwen/Qwen2.5-32B

Then, you can launch client reward servers to acces the DrSow reward model.

from reward_hub import AutoRM
from reward_hub.drsow import DrSowConfig

drsow_config = DrSowConfig(
    strong_model_name="Qwen/Qwen2.5-32B-instruct",
    strong_port=8305,
    weak_model_name="Qwen/Qwen2.5-32B",
    weak_port=8306
)

model = AutoRM.load("drsow", load_method="openai", drsow_config=drsow_config)

# Get scores for responses
scores = model.score([
    [
        {"role": "user", "content": "What is 2+2?"},
        {"role": "assistant", "content": "The answer is 4."}
    ]
])

LLM-as-a-Judge

Use LLMs to evaluate conversation quality with customizable criteria:

from reward_hub.llm_judge import create_pointwise_judge, create_groupwise_judge, CriterionRegistry
from reward_hub.llm_judge.prompts import Criterion

# Pointwise: Score individual conversations (0-10)
judge = create_pointwise_judge(model="gpt-4o-mini", criterion="overall_quality", api_key="sk-...")
score = judge.score([{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}])

# Groupwise: Rank and select top N responses
judge = create_groupwise_judge(model="gpt-4o-mini", criterion="multi_step_tool_judge", api_key="sk-...")
scores = judge.score(conversations, top_n=2)  # Returns [1.0, 0.0, 1.0] for top-2

# Custom criteria
CriterionRegistry.register(Criterion(
    name="code_quality",
    content="Evaluate: readability, best practices, error handling, efficiency"
))
judge = create_pointwise_judge(model="gpt-4o-mini", criterion="code_quality", api_key="sk-...")

Built-in criteria: overall_quality, multi_step_tool_judge Supported providers: OpenAI, Anthropic, Google, Azure (via LiteLLM)

Supported Backends

RewardHub supports multiple serving backends:

HuggingFace (load_method="hf"): Direct local model loading
VLLM (load_method="vllm"): Optimized local serving
OpenAI API (load_method="openai"): Remote API access

Supported Models

We support various reward models including:

Model	Type	HuggingFace	VLLM	OpenAI
`Qwen/Qwen2.5-Math-PRM-7B`	PRM	✓	✓	✗
`internlm/internlm2-7b-reward`	ORM	✓	✗	✗
`RLHFlow/Llama3.1-8B-PRM-Deepseek-Data`	PRM	✓	✗	✗
`RLHFlow/ArmoRM-Llama3-8B-v0.1`	ORM	✗	✗	✗
`drsow`	ORM	✗	✗	✓

Research

RewardHub serves as the official implementation of the paper:
Dr. SoW: Density Ratio of Strong-over-weak LLMs for Reducing the Cost of Human Annotation in Preference Tuning

The paper introduces CDR, a novel approach to generating high-quality preference annotations using density ratios tailored to domain-specific needs.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gx-ai-architect meyceoz

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.10

Apr 8, 2026

0.1.9

Dec 1, 2025

0.1.8

Nov 25, 2025

0.1.7

Oct 22, 2025

0.1.6

Oct 21, 2025

0.1.5

Oct 17, 2025

0.1.4

Oct 17, 2025

0.1.3

Oct 17, 2025

0.1.2

Oct 1, 2025

0.1.1

May 15, 2025

0.1.0

Apr 29, 2025

0.1.0a1 pre-release

Apr 28, 2025

0.0.0 yanked

Apr 28, 2025

Reason this release was yanked:

Test release

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reward_hub-0.1.10.tar.gz (44.4 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reward_hub-0.1.10-py2.py3-none-any.whl (31.6 kB view details)

Uploaded Apr 8, 2026 Python 2Python 3

File details

Details for the file reward_hub-0.1.10.tar.gz.

File metadata

Download URL: reward_hub-0.1.10.tar.gz
Upload date: Apr 8, 2026
Size: 44.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for reward_hub-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`9b301a14c9e3231a2b71932b16b0c8afaa0acdf123cd1b071b8441489f94d189`
MD5	`ff430ae9a70998b60ea630a5192422af`
BLAKE2b-256	`79a022c6294ab2d641dc7370af69c317011755883b954051a43fc77953a5c43e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for reward_hub-0.1.10.tar.gz:

Publisher: pypi.yaml on Red-Hat-AI-Innovation-Team/reward_hub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: reward_hub-0.1.10.tar.gz
- Subject digest: 9b301a14c9e3231a2b71932b16b0c8afaa0acdf123cd1b071b8441489f94d189
- Sigstore transparency entry: 1254350416
- Sigstore integration time: Apr 8, 2026
Source repository:
- Permalink: Red-Hat-AI-Innovation-Team/reward_hub@2d815ee9067603190a095c9cde8b09d0578656fd
- Branch / Tag: refs/tags/v0.1.10
- Owner: https://github.com/Red-Hat-AI-Innovation-Team
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yaml@2d815ee9067603190a095c9cde8b09d0578656fd
- Trigger Event: release

File details

Details for the file reward_hub-0.1.10-py2.py3-none-any.whl.

File metadata

Download URL: reward_hub-0.1.10-py2.py3-none-any.whl
Upload date: Apr 8, 2026
Size: 31.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for reward_hub-0.1.10-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`89dd435d3269167013135a97decfa88983a2288f46c1586ea709bc4ff166102e`
MD5	`c8e93fe4de6630ba7489471b3170306e`
BLAKE2b-256	`712f620762b67cab78e0f01fdad15405f0e9c49a3c711f1a18411761f8355438`

See more details on using hashes here.

Provenance

The following attestation bundles were made for reward_hub-0.1.10-py2.py3-none-any.whl:

Publisher: pypi.yaml on Red-Hat-AI-Innovation-Team/reward_hub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: reward_hub-0.1.10-py2.py3-none-any.whl
- Subject digest: 89dd435d3269167013135a97decfa88983a2288f46c1586ea709bc4ff166102e
- Sigstore transparency entry: 1254350504
- Sigstore integration time: Apr 8, 2026
Source repository:
- Permalink: Red-Hat-AI-Innovation-Team/reward_hub@2d815ee9067603190a095c9cde8b09d0578656fd
- Branch / Tag: refs/tags/v0.1.10
- Owner: https://github.com/Red-Hat-AI-Innovation-Team
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yaml@2d815ee9067603190a095c9cde8b09d0578656fd
- Trigger Event: release

reward-hub 0.1.10

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

RewardHub

Getting Started

Installation

Basic Installation

PRM Installation (Qwen-PRM Support)

Development Installation

Usage Examples

Process Reward Models (PRM)

Outcome Reward Models (ORM)

DrSow Reward Model

LLM-as-a-Judge

Supported Backends

Supported Models

Research

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance