AI alignment and reward balance analysis for reinforcement learning systems

These details have not been verified by PyPI

Project links

Project description

RewardGuard

Trust Your AI Training

RewardGuard is an AI alignment and safety tooling company focused on reinforcement learning systems. We provide reward auditing libraries that help developers detect reward hacking, misalignment, and training degradation early in the training process.

🎯 What is RewardGuard?

RewardGuard analyzes your RL training logs and ensures your reward functions are balanced and aligned with your intended goals. It detects when agents find unintended ways to maximize rewards (reward hacking) and provides actionable insights to fix them.

Key Features

Reward Distribution Analysis - Understand how rewards are distributed across different sources
Imbalance Detection - Automatically detect when reward components are misaligned
Training Diagnostics - Monitor trends and catch training issues early
Actionable Recommendations - Get clear suggestions on how to fix imbalances
Auto-Adjustment (Premium) - Automatically rebalance rewards during training

📦 Two Versions

🟢 Free Version

What it does:

Analyzes reward distributions
Detects imbalances and dominance patterns
Provides warnings and recommendations
Generates detailed reports

What it doesn't do:

Does NOT modify training behavior
Read-only analysis and insights

Installation:

pip install rewardguard

🔒 Premium Version (Private)

Everything in Free, PLUS:

Automatic reward rebalancing
Live monitoring during training
Guardrails against reward hacking
Continuous alignment enforcement
Production-safe controls

Installation:

pip install rewardguard-premium --index-url <private-registry-url>
# Requires authentication token

🚀 Quick Start

Free Version Example

from rewardguard import RewardGuard

# Initialize
guard = RewardGuard(tolerance=5.0)

# Parse your training logs
episodes = guard.parse_logs(raw_log_text)

# Define expected distribution
expected = {
    "reward_a": 60.0,  # Want 60% from component A
    "reward_b": 40.0   # Want 40% from component B
}

# Analyze balance
result = guard.analyze_balance(episodes, expected)

# Print report
guard.print_analysis_report(result)

Output:

REWARDGUARD ANALYSIS REPORT
============================================================
📊 General Statistics:
   Episodes analyzed: 50
   Reward sources found: reward_a, reward_b

📈 Reward Distribution (%):
   Source          Real       Expected   Diff       Status
   --------------- ---------- ---------- ---------- ------------
   reward_a        75.2       60.0       +15.2      ⚠️  imbalanced
   reward_b        24.8       40.0       -15.2      ⚠️  imbalanced

🎯 Recommended Reward Weights (multipliers):
   reward_a: 0.82x (ADJUST)
   reward_b: 1.54x (ADJUST)

🔧 Summary of Actions Needed:
   • reward_a: Decrease weight by ~15.2%
   • reward_b: Increase weight by ~15.2%

Premium Version Example

from rewardguard import AutoBalanceSystem

# Initialize with auto-tuning enabled
balance = AutoBalanceSystem(auto_tune=True)

# Define components
component_a = balance.define("component_a", initial=10.0)
component_b = balance.define("component_b", initial=5.0)

# Set expected distribution
balance.set_expected_distribution({
    "component_a": 60,
    "component_b": 40
})

# During training loop
for episode in range(100):
    # Your agent trains and collects rewards
    episode_rewards = {
        "component_a": component_a.current_value * some_calculation(),
        "component_b": component_b.current_value * some_calculation()
    }
    
    # Log performance - RewardGuard auto-adjusts every 10 episodes
    balance.log_performance({
        "rewards": episode_rewards,
        "outcome": "success",
        "steps": 100,
        "score": sum(episode_rewards.values())
    })

# Get final adjusted values
final_values = balance.get_current_values()
print(f"Auto-adjusted values: {final_values}")

📖 Use Cases

1. Game AI

Ensure your game AI learns to play properly, not exploit bugs:

Detect when agents farm easy points instead of completing objectives
Balance combat vs exploration rewards
Prevent exploit-based strategies

2. Robotics

Keep robots aligned with safety and task completion:

Balance speed vs safety rewards
Ensure proper task prioritization
Detect reward shortcuts

3. Recommendation Systems

Align recommendation rewards with business goals:

Balance engagement vs revenue
Prevent clickbait optimization
Ensure long-term user satisfaction

4. General RL Research

Debug and optimize any RL training:

Understand reward dynamics
Catch training issues early
Validate reward function design

🏗️ How It Works

Free Version (Analysis Only)

Parse Logs - Extracts reward data from training logs
Aggregate - Calculates actual reward distribution
Compare - Compares against your expected distribution
Recommend - Suggests specific weight adjustments

Key Principle: Tells you what's wrong, you fix it manually.

Premium Version (Auto-Fix)

All Free features, PLUS:
Monitor - Tracks performance over time
Detect - Identifies imbalances automatically
Adjust - Modifies reward weights in real-time
Learn - Continuously tunes based on results

Key Principle: Fixes problems for you automatically.

🎓 Philosophy

We believe AI should be:

Transparent - You should understand what your AI is learning
Aligned - Reward functions should incentivize intended behaviors
Safe - Training should be monitored for unintended outcomes

RewardGuard helps ensure your models learn what you intend, not just how to maximize scores.

💰 Pricing

Feature	Free	Premium
Reward analysis	✅	✅
Imbalance detection	✅	✅
Recommendations	✅	✅
Auto-adjustment	❌	✅
Live monitoring	❌	✅
Unlimited training steps	❌	✅
Priority support	❌	✅
Price	$0/month	$99/month

📚 Documentation

Website: [Coming Soon]
API Docs: [Coming Soon]
Tutorials: [Coming Soon]
Docs: https://docs.rewardguard.dev

🤝 Support

Community (Free): [Discord/Forum Link]
Email (Premium): support@rewardguard.dev
Chat (Premium): Available in dashboard

📄 License

Free Version: MIT License
Premium Version: Proprietary

🚧 Roadmap

Support for more log formats
Built-in visualization dashboard
Integration with popular RL frameworks (Stable-Baselines3, RLlib)
Cloud-based monitoring
Team collaboration features
Custom alerting rules

⚡ Quick Links

Get Started Free
Upgrade to Premium
View Examples
Read the Docs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.4

Apr 16, 2026

This version

1.0.3

Apr 14, 2026

1.0.2

Apr 8, 2026

1.0.1

Apr 6, 2026

1.0.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rewardguard-1.0.3.tar.gz (18.3 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rewardguard-1.0.3-py3-none-any.whl (16.6 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file rewardguard-1.0.3.tar.gz.

File metadata

Download URL: rewardguard-1.0.3.tar.gz
Upload date: Apr 14, 2026
Size: 18.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`c520ef14b70a00f7f85449341e9258c240f5b51ac6300b0b99e965c5c0f65b2a`
MD5	`0326ee420d24982e3cdb41762af4f0ef`
BLAKE2b-256	`17c41957ed0ef8c441080e9b932474ec126be53f0e66a124ceabc36c65e67bbd`

See more details on using hashes here.

File details

Details for the file rewardguard-1.0.3-py3-none-any.whl.

File metadata

Download URL: rewardguard-1.0.3-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 16.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae64991ff16040134276d7da941a2e5a01ac75ecb23f2f3fc5075be7812afa35`
MD5	`78be09996c9c6ec6feaf5e007601b15d`
BLAKE2b-256	`48ae70badb6a6d9cffa9809a3fc6807caa1a3fe3b4514bf9ba091774ecce7d5a`

See more details on using hashes here.

rewardguard 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RewardGuard

🎯 What is RewardGuard?

Key Features

📦 Two Versions

🟢 Free Version

🔒 Premium Version (Private)

🚀 Quick Start

Free Version Example

Premium Version Example

📖 Use Cases

1. Game AI

2. Robotics

3. Recommendation Systems

4. General RL Research

🏗️ How It Works

Free Version (Analysis Only)

Premium Version (Auto-Fix)

🎓 Philosophy

💰 Pricing

📚 Documentation

🤝 Support

📄 License

🚧 Roadmap

⚡ Quick Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes