Skip to main content

A Python library for solving finite Markov Decision Processes using value iteration

Project description

upgrade-policy-optimizer

upgrade-policy-optimizer is a Python library for solving sequential decision-making problems under uncertainty.

Given states, actions, probabilities, and costs, compute the optimal policy that minimizes expected total cost to reach a goal.

It models problems as finite Markov Decision Processes (MDPs) and solves them using the Bellman optimality equation (value iteration).

๐Ÿš€ JSON-Based Solver - No Programming Required!

Define your problem in JSON and get the optimal strategy:

python3 -m upo.cli my_problem.json

See docs/JSON_GUIDE.md for format details.

Or from Python:

from upo import solve_from_json
result = solve_from_json("my_problem.json")  # General-purpose format

๐Ÿ“– New to MDPs? Start Here!

๐Ÿ‘‰ Practical Guide: When and How to Use This Tool

This guide explains:

  • โœ… How to know if this tool can solve your problem
  • โœ… What questions it answers (with real examples)
  • โœ… Simple explanations of deployment, manufacturing, and investment examples
  • โœ… How to recognize patterns in your own problems
  • โœ… Common misconceptions and key lessons

Perfect for: Business users, product managers, engineers new to optimization

๐Ÿ“Š Ideal for Data-Driven Teams

Have metrics? This tool is perfect for you!

If you track success rates, costs, or failure rates โ†’ you're ready to optimize!

  • โœ… Use real data for accurate results
  • โœ… Quantify exact savings (dollars/hours)
  • โœ… Validate predictions against actual outcomes
  • โœ… Make data-backed decisions your team can trust

๐Ÿ“š Documentation

Need Document
Will this solve my problem? Practical Guide
Configure my problem (JSON) JSON Guide
Python API API Reference (below)
Understand the math Walkthrough
Implementation details Implementation Notes
Algorithm verification Verification Tests

Full index: docs/DOCUMENTATION_INDEX.md


Why This Exists

A lot of real processes are not just "probability calculations".

They are:

  • Multi-step processes
  • Have random outcomes (success/failure)
  • Impose penalties on failure (loss of progress)
  • Allow paid risk-reduction options (insurance / safety mechanisms / retries)

This project answers:

What should I do at each step to minimize long-run expected cost?

What the Library Does

Given:

  • A set of discrete states $S$
  • A target terminal state $T$
  • Available actions per state $A(s)$
  • Action costs
  • Transition probabilities (or success/failure probabilities)
  • Setback rules (how failures move you backward)

The Library Computes:

  • $V(s)$: Expected minimum cost to reach the target from state $s$
  • $\pi(s)$: Optimal action/policy at each state
  • Comparative "worth it?" metrics across actions

Core Idea: Bellman Optimality

The Bellman optimality principle is:

The optimal decision now is the one that minimizes
immediate cost + expected optimal future cost.

Standard Form (MDP)

$$V(s) = \min_{a} \left[ C(s,a) + \sum_{s'} P(s'|s,a) , V(s') \right]$$

Where:

  • $V(s)$ = best expected cost starting at state $s$
  • $C(s,a)$ = immediate cost for action $a$ at state $s$
  • $P(s'|s,a)$ = probability of transitioning to state $s'$

Simplified Form for Success/Failure Processes

Many real systems (and our default templates) have only two outcomes:

  • Success
  • Failure (setback)

So the equation becomes:

$$Q(s,a) = C(s,a) + P(s,a) \cdot V(s_{\text{success}}) + (1 - P(s,a)) \cdot V(s_{\text{fail}})$$

$$V(s) = \min_{a} Q(s,a)$$

This compact equation is powerful: it naturally accounts for deep failure cascades (e.g. repeated failures pushing you far backward).

Outputs and Visualizations

The library can generate:

1) Optimal Policy per State

Which action to take at each step.

2) Expected Cost Curves (Planned)

๐Ÿšง Work in Progress: This feature is not yet implemented.

Plots of:

  • Expected cost using each action
  • Optimal expected cost curve

3) ROI / Savings Graphs (Planned)

๐Ÿšง Work in Progress: This feature is not yet implemented.

Shows how much each risk-reduction option saves vs baseline.

These visuals make it easy to explain why a particular action is optimal.

Installation and Quick Start

Install: pip install upgrade-policy-optimizer (from PyPI).
First run and tutorial: See QUICKSTART.md (canonical install options, run example, first MDP).

Minimal example:

from upo import MDP, solve_mdp_value_iteration

mdp = MDP.from_dict(
    states=["start", "goal"],
    terminal_states=["goal"],
    transitions_dict={"start": {"try": {"goal": 0.7, "start": 0.3}}},
    costs_dict={"start": {"try": 1.0}}
)
result = solve_mdp_value_iteration(mdp)
print(result.get_value("start"), result.get_policy("start"))

Or from JSON: python3 -m upo.cli my_problem.json (see JSON Guide).


Configuration Model (Conceptual)

A configuration defines:

  • State range / terminal state
  • Actions (including probability modifiers and/or setback modifiers)
  • Per-state costs
  • Base success probabilities

Typical actions represent "risk tools", such as:

  • Increasing success chance
  • Preventing setback on failure
  • Reducing penalty severity

Potential Applications

This framework can model any multi-step process with probabilistic outcomes and setback penalties:

Software & DevOps

  • CI/CD pipelines with rollback costs on failed deployments
  • Data migration with partial completion and rollback penalties
  • Distributed transaction retries with backoff strategies

Manufacturing & Operations

  • Quality control loops where failed items require rework
  • Multi-stage assembly where defects force partial disassembly
  • Equipment calibration with precision levels and recalibration costs

Finance & Risk Management

  • Investment strategies with transaction costs and stop-loss triggers
  • Insurance optimization (when to buy coverage vs. accept risk)
  • Resource allocation under uncertainty with reallocation penalties

Each is the same underlying math: an MDP solved via Bellman optimality.

๐Ÿ‘‰ For detailed examples with real-world scenarios and explanations, see PRACTICAL_GUIDE.md

Project Origin

This project was inspired by sequential decision-making problems where:

  • Each step has a cost and success probability
  • Failures can set you back (loss of progress, rework required)
  • Multiple strategies are available with different cost/risk trade-offs
  • The optimal decision is not always obvious - sometimes "cheap" steps need protection if failure risks expensive recovery

The key insight is that local optimization (choosing the best option at each step) doesn't always yield the globally optimal strategy. You need to consider the full path to the goal, including the cost of recovery from failures.

This library provides a general-purpose solver for any such stochastic sequential decision problem.

License

This project is licensed under the MIT License - see the LICENSE file for details.

API Reference

Core Classes

MDP

Represents a finite Markov Decision Process.

Creation Methods:

  • MDP(...): Direct construction with numpy arrays
  • MDP.from_dict(...): Convenient dictionary-based construction with labels

Key Methods:

  • is_terminal(state): Check if a state is terminal
  • get_actions(state): Get available actions for a state
  • get_cost(state, action): Get immediate cost for an action
  • get_transition_probs(state, action): Get transition probability distribution

MDPResult

Result from solving an MDP.

Attributes:

  • V: Value function (expected cost-to-go from each state)
  • policy: Optimal action for each state
  • Q: Q-function (state-action values)
  • iterations: Number of iterations performed
  • converged: Whether the algorithm converged
  • residual: Final maximum value difference

Key Methods:

  • get_value(state): Get optimal value for a state (supports labels)
  • get_policy(state): Get optimal action for a state (supports labels)
  • get_q_value(state, action): Get Q-value for a state-action pair

solve_mdp_value_iteration

Solve an MDP using value iteration.

solve_mdp_value_iteration(
    mdp: MDP,
    tol: float = 1e-9,
    max_iter: int = 100000,
    initial_v: Optional[np.ndarray] = None,
    validate: bool = True
) -> MDPResult

Parameters:

  • mdp: The MDP to solve
  • tol: Convergence tolerance (max absolute value change)
  • max_iter: Maximum iterations
  • initial_v: Optional initial value function
  • validate: Whether to validate MDP structure before solving

Returns:

  • MDPResult with optimal value function, policy, and convergence info

Command-Line Interface

The CLI provides a simple way to solve MDPs from JSON configuration files without writing Python code.

Basic Usage

# If package is installed (either form works):
python3 -m upo.cli examples/configs/manufacturing_process.json
upo-solve examples/configs/manufacturing_process.json

# If not installed, use PYTHONPATH:
PYTHONPATH=src python3 -m upo.cli examples/configs/manufacturing_process.json

Command-Line Options

  • --tol <float>: Convergence tolerance (default: 1e-9)

    python3 -m upo.cli problem.json --tol 1e-12
    
  • --max-iter <int>: Maximum iterations (default: 100000)

    python3 -m upo.cli problem.json --max-iter 50000
    
  • --no-validate: Skip MDP validation (faster but potentially unsafe)

    python3 -m upo.cli problem.json --no-validate
    
  • --verbose / -v: Show detailed output including Q-values

    python3 -m upo.cli problem.json --verbose
    
  • --version: Show version information

    python3 -m upo.cli --version
    

Output

The CLI displays:

  • Optimal policy (action to take at each state)
  • Expected costs from each state
  • Convergence information (iterations, residual)
  • Q-values (with --verbose flag)

Project Structure

upgrade-policy-optimizer/
โ”œโ”€โ”€ src/upo/              # Core library
โ”‚   โ”œโ”€โ”€ __init__.py      # Package exports
โ”‚   โ”œโ”€โ”€ mdp.py           # MDP data structures
โ”‚   โ”œโ”€โ”€ solver.py        # Value iteration solver
โ”‚   โ”œโ”€โ”€ result.py        # Result container
โ”‚   โ”œโ”€โ”€ validate.py      # Validation utilities
โ”‚   โ”œโ”€โ”€ json_solver.py   # JSON-based solver
โ”‚   โ””โ”€โ”€ cli.py           # Command-line interface
โ”œโ”€โ”€ tests/               # Test suite
โ”‚   โ”œโ”€โ”€ test_mdp.py
โ”‚   โ”œโ”€โ”€ test_solver.py
โ”‚   โ”œโ”€โ”€ test_validate.py
โ”‚   โ””โ”€โ”€ test_sanity_checks.py
โ”œโ”€โ”€ examples/            # Usage examples
โ”‚   โ””โ”€โ”€ configs/         # Example MDP configurations
โ”œโ”€โ”€ docs/                # Documentation
โ”œโ”€โ”€ pyproject.toml       # Package configuration
โ””โ”€โ”€ README.md            # This file

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Quick checklist:

  • Add tests for new features
  • Run pytest to verify tests pass
  • Format code with black
  • Check types with mypy
  • Update documentation as needed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upgrade_policy_optimizer-0.1.2.tar.gz (121.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upgrade_policy_optimizer-0.1.2-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file upgrade_policy_optimizer-0.1.2.tar.gz.

File metadata

  • Download URL: upgrade_policy_optimizer-0.1.2.tar.gz
  • Upload date:
  • Size: 121.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for upgrade_policy_optimizer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 111e969b366c7ffebbad42767983f2a8b35535a95b8d65368424b9a6b6d6ce50
MD5 bebcfe3df1f67946b27acba4feb8a44c
BLAKE2b-256 41d49f0916d49076e20953a50b08c875350823f6d8e5f38c2c2c9a67972a2d81

See more details on using hashes here.

File details

Details for the file upgrade_policy_optimizer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for upgrade_policy_optimizer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 19f55565bd64204b42c5012a7120901a55f19223455cc88d7d94f13736baa3f2
MD5 e0117bdca35cbfd16885d9bb93a716a9
BLAKE2b-256 48400d2d279c7a45a8a41b8772f6ccbead65f0e75a46996c3393f5cbf12c3a5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page