Skip to main content

A Python library for solving finite Markov Decision Processes using value iteration

Project description

upgrade-policy-optimizer

upgrade-policy-optimizer is a Python library for solving sequential decision-making problems under uncertainty.

Given states, actions, probabilities, and costs, compute the optimal policy that minimizes expected total cost to reach a goal.

It models problems as finite Markov Decision Processes (MDPs) and solves them using the Bellman optimality equation (value iteration).

๐Ÿš€ JSON-Based Solver - No Programming Required!

Define your problem in JSON and get the optimal strategy:

python3 -m upo.cli my_problem.json

See docs/JSON_GUIDE.md for format details.

Or from Python:

from upo import solve_from_json
result = solve_from_json("my_problem.json")  # General-purpose format

๐Ÿ“– New to MDPs? Start Here!

๐Ÿ‘‰ Practical Guide: When and How to Use This Tool

This guide explains:

  • โœ… How to know if this tool can solve your problem
  • โœ… What questions it answers (with real examples)
  • โœ… Simple explanations of deployment, manufacturing, and investment examples
  • โœ… How to recognize patterns in your own problems
  • โœ… Common misconceptions and key lessons

Perfect for: Business users, product managers, engineers new to optimization

๐Ÿ“Š Ideal for Data-Driven Teams

Have metrics? This tool is perfect for you!

If you track success rates, costs, or failure rates โ†’ you're ready to optimize!

  • โœ… Use real data for accurate results
  • โœ… Quantify exact savings (dollars/hours)
  • โœ… Validate predictions against actual outcomes
  • โœ… Make data-backed decisions your team can trust

๐Ÿ“š Documentation Map

Choose your path based on your needs:

๐ŸŽฏ "Will this solve my problem?"

โ†’ Practical Guide - Real examples, simple explanations, pattern recognition

๐Ÿ“‹ "I need to configure my problem"

โ†’ JSON Configuration Guide - Complete format reference

๐Ÿ’ป "I want to use the Python API"

โ†’ API Reference - Programmatic usage examples

๐ŸŽ“ "I want to understand the math"

โ†’ Algorithm Walkthrough - Value iteration explained

๐Ÿ”ง "How was this implemented?"

โ†’ Implementation Notes - Design decisions

โœ… "How do I know it works?"

โ†’ Verification Tests - Algorithm verification


Why This Exists

A lot of real processes are not just "probability calculations".

They are:

  • Multi-step processes
  • Have random outcomes (success/failure)
  • Impose penalties on failure (loss of progress)
  • Allow paid risk-reduction options (insurance / safety mechanisms / retries)

This project answers:

What should I do at each step to minimize long-run expected cost?

What the Library Does

Given:

  • A set of discrete states $S$
  • A target terminal state $T$
  • Available actions per state $A(s)$
  • Action costs
  • Transition probabilities (or success/failure probabilities)
  • Setback rules (how failures move you backward)

The Library Computes:

  • $V(s)$: Expected minimum cost to reach the target from state $s$
  • $\pi(s)$: Optimal action/policy at each state
  • Comparative "worth it?" metrics across actions

Core Idea: Bellman Optimality

The Bellman optimality principle is:

The optimal decision now is the one that minimizes
immediate cost + expected optimal future cost.

Standard Form (MDP)

$$V(s) = \min_{a} \left[ C(s,a) + \sum_{s'} P(s'|s,a) , V(s') \right]$$

Where:

  • $V(s)$ = best expected cost starting at state $s$
  • $C(s,a)$ = immediate cost for action $a$ at state $s$
  • $P(s'|s,a)$ = probability of transitioning to state $s'$

Simplified Form for Success/Failure Processes

Many real systems (and our default templates) have only two outcomes:

  • Success
  • Failure (setback)

So the equation becomes:

$$Q(s,a) = C(s,a) + P(s,a) \cdot V(s_{\text{success}}) + (1 - P(s,a)) \cdot V(s_{\text{fail}})$$

$$V(s) = \min_{a} Q(s,a)$$

This compact equation is powerful: it naturally accounts for deep failure cascades (e.g. repeated failures pushing you far backward).

Outputs and Visualizations

The library can generate:

1) Optimal Policy per State

Which action to take at each step.

2) Expected Cost Curves (Planned)

๐Ÿšง Work in Progress: This feature is not yet implemented.

Plots of:

  • Expected cost using each action
  • Optimal expected cost curve

3) ROI / Savings Graphs (Planned)

๐Ÿšง Work in Progress: This feature is not yet implemented.

Shows how much each risk-reduction option saves vs baseline.

These visuals make it easy to explain why a particular action is optimal.

Installation

๐Ÿ‘‰ For detailed installation and quick start instructions, see QUICKSTART.md

Quick Install

# From source
git clone https://github.com/eonof/upgrade-policy-optimizer
cd upgrade-policy-optimizer
pip install -e ".[dev]"  # With dev dependencies
# or
pip install -e .  # Basic install

Quick Start

๐Ÿ‘‰ For complete installation instructions, examples, and tutorials, see QUICKSTART.md

Quick example:

from upo import MDP, solve_mdp_value_iteration

mdp = MDP.from_dict(
    states=["start", "goal"],
    terminal_states=["goal"],
    transitions_dict={
        "start": {"try": {"goal": 0.7, "start": 0.3}}
    },
    costs_dict={"start": {"try": 1.0}}
)

result = solve_mdp_value_iteration(mdp)
print(f"Expected cost: {result.get_value('start'):.2f}")
print(f"Optimal action: {result.get_policy('start')}")

Or use JSON configuration:

# If package is installed:
python3 -m upo.cli examples/configs/manufacturing_process.json

# If not installed, use PYTHONPATH:
PYTHONPATH=src python3 -m upo.cli examples/configs/manufacturing_process.json

Configuration Model (Conceptual)

A configuration defines:

  • State range / terminal state
  • Actions (including probability modifiers and/or setback modifiers)
  • Per-state costs
  • Base success probabilities

Typical actions represent "risk tools", such as:

  • Increasing success chance
  • Preventing setback on failure
  • Reducing penalty severity

Potential Applications

This framework can model any multi-step process with probabilistic outcomes and setback penalties:

Software & DevOps

  • CI/CD pipelines with rollback costs on failed deployments
  • Data migration with partial completion and rollback penalties
  • Distributed transaction retries with backoff strategies

Manufacturing & Operations

  • Quality control loops where failed items require rework
  • Multi-stage assembly where defects force partial disassembly
  • Equipment calibration with precision levels and recalibration costs

Finance & Risk Management

  • Investment strategies with transaction costs and stop-loss triggers
  • Insurance optimization (when to buy coverage vs. accept risk)
  • Resource allocation under uncertainty with reallocation penalties

Each is the same underlying math: an MDP solved via Bellman optimality.

๐Ÿ‘‰ For detailed examples with real-world scenarios and explanations, see PRACTICAL_GUIDE.md

Project Origin

This project was inspired by sequential decision-making problems where:

  • Each step has a cost and success probability
  • Failures can set you back (loss of progress, rework required)
  • Multiple strategies are available with different cost/risk trade-offs
  • The optimal decision is not always obvious - sometimes "cheap" steps need protection if failure risks expensive recovery

The key insight is that local optimization (choosing the best option at each step) doesn't always yield the globally optimal strategy. You need to consider the full path to the goal, including the cost of recovery from failures.

This library provides a general-purpose solver for any such stochastic sequential decision problem.

License

This project is licensed under the MIT License - see the LICENSE file for details.

API Reference

Core Classes

MDP

Represents a finite Markov Decision Process.

Creation Methods:

  • MDP(...): Direct construction with numpy arrays
  • MDP.from_dict(...): Convenient dictionary-based construction with labels

Key Methods:

  • is_terminal(state): Check if a state is terminal
  • get_actions(state): Get available actions for a state
  • get_cost(state, action): Get immediate cost for an action
  • get_transition_probs(state, action): Get transition probability distribution

MDPResult

Result from solving an MDP.

Attributes:

  • V: Value function (expected cost-to-go from each state)
  • policy: Optimal action for each state
  • Q: Q-function (state-action values)
  • iterations: Number of iterations performed
  • converged: Whether the algorithm converged
  • residual: Final maximum value difference

Key Methods:

  • get_value(state): Get optimal value for a state (supports labels)
  • get_policy(state): Get optimal action for a state (supports labels)
  • get_q_value(state, action): Get Q-value for a state-action pair

solve_mdp_value_iteration

Solve an MDP using value iteration.

solve_mdp_value_iteration(
    mdp: MDP,
    tol: float = 1e-9,
    max_iter: int = 100000,
    initial_v: Optional[np.ndarray] = None,
    validate: bool = True
) -> MDPResult

Parameters:

  • mdp: The MDP to solve
  • tol: Convergence tolerance (max absolute value change)
  • max_iter: Maximum iterations
  • initial_v: Optional initial value function
  • validate: Whether to validate MDP structure before solving

Returns:

  • MDPResult with optimal value function, policy, and convergence info

Command-Line Interface

The CLI provides a simple way to solve MDPs from JSON configuration files without writing Python code.

Basic Usage

# If package is installed:
python3 -m upo.cli examples/configs/manufacturing_process.json

# If not installed, use PYTHONPATH:
PYTHONPATH=src python3 -m upo.cli examples/configs/manufacturing_process.json

Command-Line Options

  • --tol <float>: Convergence tolerance (default: 1e-9)

    python3 -m upo.cli problem.json --tol 1e-12
    
  • --max-iter <int>: Maximum iterations (default: 100000)

    python3 -m upo.cli problem.json --max-iter 50000
    
  • --no-validate: Skip MDP validation (faster but potentially unsafe)

    python3 -m upo.cli problem.json --no-validate
    
  • --verbose / -v: Show detailed output including Q-values

    python3 -m upo.cli problem.json --verbose
    
  • --version: Show version information

    python3 -m upo.cli --version
    

Output

The CLI displays:

  • Optimal policy (action to take at each state)
  • Expected costs from each state
  • Convergence information (iterations, residual)
  • Q-values (with --verbose flag)

Project Structure

upgrade-policy-optimizer/
โ”œโ”€โ”€ src/upo/              # Core library
โ”‚   โ”œโ”€โ”€ __init__.py      # Package exports
โ”‚   โ”œโ”€โ”€ mdp.py           # MDP data structures
โ”‚   โ”œโ”€โ”€ solver.py        # Value iteration solver
โ”‚   โ”œโ”€โ”€ result.py        # Result container
โ”‚   โ”œโ”€โ”€ validate.py      # Validation utilities
โ”‚   โ”œโ”€โ”€ json_solver.py   # JSON-based solver
โ”‚   โ””โ”€โ”€ cli.py           # Command-line interface
โ”œโ”€โ”€ tests/               # Test suite
โ”‚   โ”œโ”€โ”€ test_mdp.py
โ”‚   โ”œโ”€โ”€ test_solver.py
โ”‚   โ”œโ”€โ”€ test_validate.py
โ”‚   โ””โ”€โ”€ test_sanity_checks.py
โ”œโ”€โ”€ examples/            # Usage examples
โ”‚   โ””โ”€โ”€ configs/         # Example MDP configurations
โ”œโ”€โ”€ docs/                # Documentation
โ”œโ”€โ”€ pyproject.toml       # Package configuration
โ””โ”€โ”€ README.md            # This file

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Quick checklist:

  • Add tests for new features
  • Run pytest to verify tests pass
  • Format code with black
  • Check types with mypy
  • Update documentation as needed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upgrade_policy_optimizer-0.1.1.tar.gz (123.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upgrade_policy_optimizer-0.1.1-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file upgrade_policy_optimizer-0.1.1.tar.gz.

File metadata

  • Download URL: upgrade_policy_optimizer-0.1.1.tar.gz
  • Upload date:
  • Size: 123.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for upgrade_policy_optimizer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3ecc6bce3c788f739ed59e5e2d0b3447c932817dbf76c8c2089c66a879de8e78
MD5 fb76d74e1149861f5e24ba5781d0c7fa
BLAKE2b-256 70eb172550cf61aa613580e0b72472880c073d5fb0b798562f0760a86f4e4fd4

See more details on using hashes here.

File details

Details for the file upgrade_policy_optimizer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for upgrade_policy_optimizer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dc4e9a10e703eb65895408c3616a6f00a9057265532ba349939d90f75146c2a6
MD5 6e42cc25bed414ab45981c688a643a32
BLAKE2b-256 c4e0f86e05682287b5e8be998202a370fcaafd027c0bf05e84e9022cde328b53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page