Skip to main content

Auto-tuning reward system for production RL applications (Premium)

Project description

RewardGuard Premium

Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction

RewardGuard Premium is the paid tier of RewardGuard. Install it freely from PyPI — just sign in to your RewardGuard account to activate.


Installation

pip install rewardguard-premium

On first use you will be prompted to sign in:

RewardGuard Premium — Sign in to your account
  Visit https://rewardguard.ai to create an account

  Email: you@example.com
  Password: ••••••••
  Signed in successfully!

Your session is saved to ~/.rewardguard/session.json and refreshed automatically. You only sign in once per machine.

Requires an active RewardGuard Premium subscription. Subscribe at: https://rewardguard.ai/premium


Sign-in CLI

rewardguard-premium login    # sign in (or switch accounts)
rewardguard-premium logout   # clear saved session
rewardguard-premium status   # show who you are signed in as

CI / automated environments — use env vars instead of the interactive prompt:

export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword

Quick Start

from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=300,   # warm-up before detection activates
    auto_correct=True,    # adjust weights automatically when flagged
)

for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)

        snapshot = monitor.step({"task": r_task, "safety": r_safety})

        if snapshot:
            if snapshot.flag == "critical":
                # Apply auto-corrected weights back to the environment
                env.set_reward_weights(monitor.weights)

monitor.print_report()

What AutoMonitor Does

Phase What happens
Warm-up (baseline_steps) Learns the normal ratio for your environment. Returns None from step().
Detection Computes per-component z-scores against the learned baseline.
Alignment score 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score.
Flagging ok (score > 0.75) / warning (> 0.5) / critical (≤ 0.5)
Auto-correction Adjusts per-component weight multipliers when flagged.
Drift velocity Linear regression slope over recent scores — distinguishes trends from spikes.

AlignmentSnapshot

Every call to step() after warm-up returns an AlignmentSnapshot:

snapshot.alignment_score      # float 0–1
snapshot.flag                 # "ok" / "warning" / "critical"
snapshot.z_scores             # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity       # negative = worsening trend
snapshot.corrections_applied  # {"safety": 1.24}  — weights changed this step
snapshot.component_ratios     # {"task": 68.3, "safety": 31.7}

Framework Integrations

Weights & Biases

from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb

wandb.init(project="my-rl-run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_wandb_callback()],
)

TensorBoard

from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback

writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_tensorboard_callback(writer)],
)

Stable-Baselines3

from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback

monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))

Your environment must include "reward_components" in its info dict for the SB3 callback.


Save / Load State

# Save after training
monitor.save("run_42_state.json")

# Resume later
monitor = AutoMonitor.load("run_42_state.json")

Export Data

monitor.to_json("results.json")   # full state
monitor.to_csv("snapshots.csv")   # one row per detection-phase step

AutoMonitor Parameters

Parameter Default Description
expected required Target distribution, e.g. {"task": 0.7, "safety": 0.3}
baseline_steps 300 Warm-up steps before detection activates
z_threshold 2.5 Z-score at which a component is flagged
auto_correct True Automatically adjust weights when flagged
correction_rate 0.2 Fraction of required correction applied per step
tolerance 5.0 Percentage-point tolerance for the free-tier check() method
window 200 Rolling window size for ratio computation
drift_window 30 Snapshots used to compute drift velocity
callbacks [] List of callables invoked with each AlignmentSnapshot

Free vs Premium

Feature Free (rewardguard) Premium
Live in-loop monitoring
Log-file analysis
Imbalance detection
Weight recommendations
Baseline learning
Z-score detection
Alignment score (0–1)
Drift velocity
Auto-correction
WandB / TensorBoard / SB3
Save / load state

License

Proprietary — requires an active RewardGuard Premium subscription. © 2026 RewardGuard | https://rewardguard.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rewardguard_premium-2.0.2.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rewardguard_premium-2.0.2-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file rewardguard_premium-2.0.2.tar.gz.

File metadata

  • Download URL: rewardguard_premium-2.0.2.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard_premium-2.0.2.tar.gz
Algorithm Hash digest
SHA256 653352a6c9f36de6e6b46e3af6fe9648e9f0a6f7ff5eeafcbcdaa0d03d7aa534
MD5 b38745d047a8531bbd7249b29872bc6b
BLAKE2b-256 3f6cc647f97b1ae9290727f14ee9765cc123fc75350b9080bf08e7aead33678c

See more details on using hashes here.

File details

Details for the file rewardguard_premium-2.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for rewardguard_premium-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1ec078da7d590c815b82911c44f9fae06e8b51b096142f91f501b1cd7c10e3a7
MD5 102ff039528230d7afd3988b2ef821c3
BLAKE2b-256 55603ac1d538206b96fa85068ce95dd8e8f0124195e192581701c75373c7f441

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page