Skip to main content

Auto-tuning reward system for production RL applications (Premium)

Project description

RewardGuard Premium

Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction

RewardGuard Premium is the paid tier of RewardGuard. Install it freely from PyPI — just sign in to your RewardGuard account to activate.


Installation

pip install rewardguard-premium

On first use you will be prompted to sign in:

RewardGuard Premium — Sign in to your account
  Visit https://rewardguard.ai to create an account

  Email: you@example.com
  Password: ••••••••
  Signed in successfully!

Your session is saved to ~/.rewardguard/session.json and refreshed automatically. You only sign in once per machine.

Requires an active RewardGuard Premium subscription. Subscribe at: https://rewardguard.ai/premium


Sign-in CLI

rewardguard-premium login    # sign in (or switch accounts)
rewardguard-premium logout   # clear saved session
rewardguard-premium status   # show who you are signed in as

CI / automated environments — use env vars instead of the interactive prompt:

export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword

Quick Start

from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=300,   # warm-up before detection activates
    auto_correct=True,    # adjust weights automatically when flagged
)

for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)

        snapshot = monitor.step({"task": r_task, "safety": r_safety})

        if snapshot:
            if snapshot.flag == "critical":
                # Apply auto-corrected weights back to the environment
                env.set_reward_weights(monitor.weights)

monitor.print_report()

What AutoMonitor Does

Phase What happens
Warm-up (baseline_steps) Learns the normal ratio for your environment. Returns None from step().
Detection Computes per-component z-scores against the learned baseline.
Alignment score 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score.
Flagging ok (score > 0.75) / warning (> 0.5) / critical (≤ 0.5)
Auto-correction Adjusts per-component weight multipliers when flagged.
Drift velocity Linear regression slope over recent scores — distinguishes trends from spikes.

AlignmentSnapshot

Every call to step() after warm-up returns an AlignmentSnapshot:

snapshot.alignment_score      # float 0–1
snapshot.flag                 # "ok" / "warning" / "critical"
snapshot.z_scores             # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity       # negative = worsening trend
snapshot.corrections_applied  # {"safety": 1.24}  — weights changed this step
snapshot.component_ratios     # {"task": 68.3, "safety": 31.7}

Framework Integrations

Weights & Biases

from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb

wandb.init(project="my-rl-run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_wandb_callback()],
)

TensorBoard

from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback

writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_tensorboard_callback(writer)],
)

Stable-Baselines3

from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback

monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))

Your environment must include "reward_components" in its info dict for the SB3 callback.


Save / Load State

# Save after training
monitor.save("run_42_state.json")

# Resume later
monitor = AutoMonitor.load("run_42_state.json")

Export Data

monitor.to_json("results.json")   # full state
monitor.to_csv("snapshots.csv")   # one row per detection-phase step

AutoMonitor Parameters

Parameter Default Description
expected required Target distribution, e.g. {"task": 0.7, "safety": 0.3}
baseline_steps 300 Warm-up steps before detection activates
z_threshold 2.5 Z-score at which a component is flagged
auto_correct True Automatically adjust weights when flagged
correction_rate 0.2 Fraction of required correction applied per step
tolerance 5.0 Percentage-point tolerance for the free-tier check() method
window 200 Rolling window size for ratio computation
drift_window 30 Snapshots used to compute drift velocity
callbacks [] List of callables invoked with each AlignmentSnapshot

Free vs Premium

Feature Free (rewardguard) Premium
Live in-loop monitoring
Log-file analysis
Imbalance detection
Weight recommendations
Baseline learning
Z-score detection
Alignment score (0–1)
Drift velocity
Auto-correction
WandB / TensorBoard / SB3
Save / load state

License

Proprietary — requires an active RewardGuard Premium subscription. © 2026 RewardGuard | https://rewardguard.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rewardguard_premium-2.0.1.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rewardguard_premium-2.0.1-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file rewardguard_premium-2.0.1.tar.gz.

File metadata

  • Download URL: rewardguard_premium-2.0.1.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard_premium-2.0.1.tar.gz
Algorithm Hash digest
SHA256 6eb28854d9e3848ae266da364c08315465e1cd0dfdbf4161e87cc54002cb93c2
MD5 64d1fd5145ca3638f9413a8aac099c4e
BLAKE2b-256 79f1bdd85fb718bbd96269d2b71752d5a160dbc2b7ef3e03e47f5c4b502c081f

See more details on using hashes here.

File details

Details for the file rewardguard_premium-2.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for rewardguard_premium-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7745220ff71385e24993cf00030b0331a92768b704265cbada9dae85db20e5c8
MD5 e64ba878a665ed39198030111e7e322d
BLAKE2b-256 aabf3cf594226b0ac3f21b0458d55afdefd06a7076dc5de027787d91cd9d83de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page