Skip to main content

Auto-tuning reward system for production RL applications (Premium)

Project description

RewardGuard Premium

Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction

RewardGuard Premium is the paid tier of RewardGuard. Install it freely from PyPI — just sign in to your RewardGuard account to activate.


Installation

pip install rewardguard-premium

On first use you will be prompted to sign in:

RewardGuard Premium — Sign in to your account
  Visit https://rewardguard.ai to create an account

  Email: you@example.com
  Password: ••••••••
  Signed in successfully!

Your session is saved to ~/.rewardguard/session.json and refreshed automatically. You only sign in once per machine.

Requires an active RewardGuard Premium subscription. Subscribe at: https://rewardguard.ai/premium


Sign-in CLI

rewardguard-premium login    # sign in (or switch accounts)
rewardguard-premium logout   # clear saved session
rewardguard-premium status   # show who you are signed in as

CI / automated environments — use env vars instead of the interactive prompt:

export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword

Quick Start

from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=300,   # warm-up before detection activates
    auto_correct=True,    # adjust weights automatically when flagged
)

for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)

        snapshot = monitor.step({"task": r_task, "safety": r_safety})

        if snapshot:
            if snapshot.flag == "critical":
                # Apply auto-corrected weights back to the environment
                env.set_reward_weights(monitor.weights)

monitor.print_report()

What AutoMonitor Does

Phase What happens
Warm-up (baseline_steps) Learns the normal ratio for your environment. Returns None from step().
Detection Computes per-component z-scores against the learned baseline.
Alignment score 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score.
Flagging ok (score > 0.75) / warning (> 0.5) / critical (≤ 0.5)
Auto-correction Adjusts per-component weight multipliers when flagged.
Drift velocity Linear regression slope over recent scores — distinguishes trends from spikes.

AlignmentSnapshot

Every call to step() after warm-up returns an AlignmentSnapshot:

snapshot.alignment_score      # float 0–1
snapshot.flag                 # "ok" / "warning" / "critical"
snapshot.z_scores             # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity       # negative = worsening trend
snapshot.corrections_applied  # {"safety": 1.24}  — weights changed this step
snapshot.component_ratios     # {"task": 68.3, "safety": 31.7}

Framework Integrations

Weights & Biases

from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb

wandb.init(project="my-rl-run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_wandb_callback()],
)

TensorBoard

from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback

writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_tensorboard_callback(writer)],
)

Stable-Baselines3

from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback

monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))

Your environment must include "reward_components" in its info dict for the SB3 callback.


Save / Load State

# Save after training
monitor.save("run_42_state.json")

# Resume later
monitor = AutoMonitor.load("run_42_state.json")

Export Data

monitor.to_json("results.json")   # full state
monitor.to_csv("snapshots.csv")   # one row per detection-phase step

AutoMonitor Parameters

Parameter Default Description
expected required Target distribution, e.g. {"task": 0.7, "safety": 0.3}
baseline_steps 300 Warm-up steps before detection activates
z_threshold 2.5 Z-score at which a component is flagged
auto_correct True Automatically adjust weights when flagged
correction_rate 0.2 Fraction of required correction applied per step
tolerance 5.0 Percentage-point tolerance for the free-tier check() method
window 200 Rolling window size for ratio computation
drift_window 30 Snapshots used to compute drift velocity
callbacks [] List of callables invoked with each AlignmentSnapshot

Free vs Premium

Feature Free (rewardguard) Premium
Live in-loop monitoring
Log-file analysis
Imbalance detection
Weight recommendations
Baseline learning
Z-score detection
Alignment score (0–1)
Drift velocity
Auto-correction
WandB / TensorBoard / SB3
Save / load state

License

Proprietary — requires an active RewardGuard Premium subscription. © 2026 RewardGuard | https://rewardguard.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rewardguard_premium-2.0.3.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rewardguard_premium-2.0.3-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file rewardguard_premium-2.0.3.tar.gz.

File metadata

  • Download URL: rewardguard_premium-2.0.3.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard_premium-2.0.3.tar.gz
Algorithm Hash digest
SHA256 c62213fa14739baa0fb1f0498e3ad3a1ccc3dc632dc98ed010c95788e67e6060
MD5 02eaf44014c04877e48d2af5f2731cd4
BLAKE2b-256 48c71a6517a6acd73c3e8f1c37c8b6230c07eec26310e0f6c61fa3c94a959649

See more details on using hashes here.

File details

Details for the file rewardguard_premium-2.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for rewardguard_premium-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6a618994d9d845ae99340da36b7b40d81b2e0461b0b0a01b57dc998ae0260c8f
MD5 e2e69365cdfb161b8dc08d562bb25efa
BLAKE2b-256 69ac767a21f11591a1c62a01a1b65fb878526770efcbf6743ec5ee2cc60d58f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page