Skip to main content

Auto-tuning reward system for production RL applications (Premium)

Project description

RewardGuard Premium

Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction

RewardGuard Premium is the paid tier of RewardGuard. Install it freely from PyPI — just sign in to your RewardGuard account to activate.


Installation

pip install rewardguard-premium

On first use you will be prompted to sign in:

RewardGuard Premium — Sign in to your account
  Visit https://rewardguard.dev to create an account

  Email: you@example.com
  Password: ••••••••
  Signed in successfully!

Your session is saved to ~/.rewardguard/session.json and refreshed automatically. You only sign in once per machine.

Requires an active RewardGuard Premium subscription. Subscribe at: https://rewardguard.dev/premium


Sign-in CLI

rewardguard-premium login    # sign in (or switch accounts)
rewardguard-premium logout   # clear saved session
rewardguard-premium status   # show who you are signed in as

CI / automated environments — use env vars instead of the interactive prompt:

export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword

Quick Start

from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=300,   # warm-up before detection activates
    auto_correct=True,    # adjust weights automatically when flagged
)

for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)

        snapshot = monitor.step({"task": r_task, "safety": r_safety})

        if snapshot:
            if snapshot.flag == "critical":
                # Apply auto-corrected weights back to the environment
                env.set_reward_weights(monitor.weights)

monitor.print_report()

What AutoMonitor Does

Phase What happens
Warm-up (baseline_steps) Learns the normal ratio for your environment. Returns None from step().
Detection Computes per-component z-scores against the learned baseline.
Alignment score 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score.
Flagging ok (score > 0.75) / warning (> 0.5) / critical (≤ 0.5)
Auto-correction Adjusts per-component weight multipliers when flagged.
Drift velocity Linear regression slope over recent scores — distinguishes trends from spikes.

AlignmentSnapshot

Every call to step() after warm-up returns an AlignmentSnapshot:

snapshot.alignment_score      # float 0–1
snapshot.flag                 # "ok" / "warning" / "critical"
snapshot.z_scores             # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity       # negative = worsening trend
snapshot.corrections_applied  # {"safety": 1.24}  — weights changed this step
snapshot.component_ratios     # {"task": 68.3, "safety": 31.7}

Framework Integrations

Weights & Biases

from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb

wandb.init(project="my-rl-run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_wandb_callback()],
)

TensorBoard

from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback

writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_tensorboard_callback(writer)],
)

Stable-Baselines3

from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback

monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))

Your environment must include "reward_components" in its info dict for the SB3 callback.


Save / Load State

# Save after training
monitor.save("run_42_state.json")

# Resume later
monitor = AutoMonitor.load("run_42_state.json")

Export Data

monitor.to_json("results.json")   # full state
monitor.to_csv("snapshots.csv")   # one row per detection-phase step

AutoMonitor Parameters

Parameter Default Description
expected required Target distribution, e.g. {"task": 0.7, "safety": 0.3}
baseline_steps 300 Warm-up steps before detection activates
z_threshold 2.5 Z-score at which a component is flagged
auto_correct True Automatically adjust weights when flagged
correction_rate 0.2 Fraction of required correction applied per step
tolerance 5.0 Percentage-point tolerance for the free-tier check() method
window 200 Rolling window size for ratio computation
drift_window 30 Snapshots used to compute drift velocity
callbacks [] List of callables invoked with each AlignmentSnapshot

Free vs Premium

Feature Free (rewardguard) Premium
Live in-loop monitoring
Log-file analysis
Imbalance detection
Weight recommendations
Baseline learning
Z-score detection
Alignment score (0–1)
Drift velocity
Auto-correction
WandB / TensorBoard / SB3
Save / load state

License

Proprietary — requires an active RewardGuard Premium subscription. © 2026 RewardGuard | https://rewardguard.dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rewardguard_premium-2.0.10.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rewardguard_premium-2.0.10-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file rewardguard_premium-2.0.10.tar.gz.

File metadata

  • Download URL: rewardguard_premium-2.0.10.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard_premium-2.0.10.tar.gz
Algorithm Hash digest
SHA256 4d8c356b007c733d81c45895ea5a59a1da92b2c6db1a1a22f0ed41233abff5f4
MD5 81fad9b64202a5e0ac1c68e5eb45f4e9
BLAKE2b-256 bea1541d1857c8a5ed7c8c4101838cd1f053e603b1359f9c60e7e260505f6e3c

See more details on using hashes here.

File details

Details for the file rewardguard_premium-2.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for rewardguard_premium-2.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 0c9b59f1d82cc67e881a1c1e2c7f05f134a7df7bdd6bcc6fbd19450c28f87790
MD5 28b7003db3edc94e7a18aa78e6bbe085
BLAKE2b-256 b4f247b643dc406a3aa528f5035767dbabd594ea2958c16e9f272f5ecc6393b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page