A Python library for solving finite Markov Decision Processes using value iteration
Project description
upgrade-policy-optimizer
upgrade-policy-optimizer is a Python library for solving sequential decision-making problems under uncertainty.
Given states, actions, probabilities, and costs, compute the optimal policy that minimizes expected total cost to reach a goal.
It models problems as finite Markov Decision Processes (MDPs) and solves them using the Bellman optimality equation (value iteration).
๐ JSON-Based Solver - No Programming Required!
Define your problem in JSON and get the optimal strategy:
python3 -m upo.cli my_problem.json
See docs/JSON_GUIDE.md for format details.
Or from Python:
from upo import solve_from_json
result = solve_from_json("my_problem.json") # General-purpose format
๐ New to MDPs? Start Here!
๐ Practical Guide: When and How to Use This Tool
This guide explains:
- โ How to know if this tool can solve your problem
- โ What questions it answers (with real examples)
- โ Simple explanations of deployment, manufacturing, and investment examples
- โ How to recognize patterns in your own problems
- โ Common misconceptions and key lessons
Perfect for: Business users, product managers, engineers new to optimization
๐ Ideal for Data-Driven Teams
Have metrics? This tool is perfect for you!
If you track success rates, costs, or failure rates โ you're ready to optimize!
- โ Use real data for accurate results
- โ Quantify exact savings (dollars/hours)
- โ Validate predictions against actual outcomes
- โ Make data-backed decisions your team can trust
๐ Documentation
| Need | Document |
|---|---|
| Will this solve my problem? | Practical Guide |
| Configure my problem (JSON) | JSON Guide |
| Python API | API Reference (below) |
| Understand the math | Walkthrough |
| Implementation details | Implementation Notes |
| Algorithm verification | Verification Tests |
Full index: docs/DOCUMENTATION_INDEX.md
Why This Exists
A lot of real processes are not just "probability calculations".
They are:
- Multi-step processes
- Have random outcomes (success/failure)
- Impose penalties on failure (loss of progress)
- Allow paid risk-reduction options (insurance / safety mechanisms / retries)
This project answers:
What should I do at each step to minimize long-run expected cost?
What the Library Does
Given:
- A set of discrete states $S$
- A target terminal state $T$
- Available actions per state $A(s)$
- Action costs
- Transition probabilities (or success/failure probabilities)
- Setback rules (how failures move you backward)
The Library Computes:
- $V(s)$: Expected minimum cost to reach the target from state $s$
- $\pi(s)$: Optimal action/policy at each state
- Comparative "worth it?" metrics across actions
Core Idea: Bellman Optimality
The Bellman optimality principle is:
The optimal decision now is the one that minimizes
immediate cost + expected optimal future cost.
Standard Form (MDP)
$$V(s) = \min_{a} \left[ C(s,a) + \sum_{s'} P(s'|s,a) , V(s') \right]$$
Where:
- $V(s)$ = best expected cost starting at state $s$
- $C(s,a)$ = immediate cost for action $a$ at state $s$
- $P(s'|s,a)$ = probability of transitioning to state $s'$
Simplified Form for Success/Failure Processes
Many real systems (and our default templates) have only two outcomes:
- Success
- Failure (setback)
So the equation becomes:
$$Q(s,a) = C(s,a) + P(s,a) \cdot V(s_{\text{success}}) + (1 - P(s,a)) \cdot V(s_{\text{fail}})$$
$$V(s) = \min_{a} Q(s,a)$$
This compact equation is powerful: it naturally accounts for deep failure cascades (e.g. repeated failures pushing you far backward).
Outputs and Visualizations
The library can generate:
1) Optimal Policy per State
Which action to take at each step.
2) Expected Cost Curves (Planned)
๐ง Work in Progress: This feature is not yet implemented.
Plots of:
- Expected cost using each action
- Optimal expected cost curve
3) ROI / Savings Graphs (Planned)
๐ง Work in Progress: This feature is not yet implemented.
Shows how much each risk-reduction option saves vs baseline.
These visuals make it easy to explain why a particular action is optimal.
Installation and Quick Start
Install: pip install upgrade-policy-optimizer (from PyPI).
First run and tutorial: See QUICKSTART.md (canonical install options, run example, first MDP).
Minimal example:
from upo import MDP, solve_mdp_value_iteration
mdp = MDP.from_dict(
states=["start", "goal"],
terminal_states=["goal"],
transitions_dict={"start": {"try": {"goal": 0.7, "start": 0.3}}},
costs_dict={"start": {"try": 1.0}}
)
result = solve_mdp_value_iteration(mdp)
print(result.get_value("start"), result.get_policy("start"))
Or from JSON: python3 -m upo.cli my_problem.json (see JSON Guide).
Configuration Model (Conceptual)
A configuration defines:
- State range / terminal state
- Actions (including probability modifiers and/or setback modifiers)
- Per-state costs
- Base success probabilities
Typical actions represent "risk tools", such as:
- Increasing success chance
- Preventing setback on failure
- Reducing penalty severity
Potential Applications
This framework can model any multi-step process with probabilistic outcomes and setback penalties:
Software & DevOps
- CI/CD pipelines with rollback costs on failed deployments
- Data migration with partial completion and rollback penalties
- Distributed transaction retries with backoff strategies
Manufacturing & Operations
- Quality control loops where failed items require rework
- Multi-stage assembly where defects force partial disassembly
- Equipment calibration with precision levels and recalibration costs
Finance & Risk Management
- Investment strategies with transaction costs and stop-loss triggers
- Insurance optimization (when to buy coverage vs. accept risk)
- Resource allocation under uncertainty with reallocation penalties
Each is the same underlying math: an MDP solved via Bellman optimality.
๐ For detailed examples with real-world scenarios and explanations, see PRACTICAL_GUIDE.md
Project Origin
This project was inspired by sequential decision-making problems where:
- Each step has a cost and success probability
- Failures can set you back (loss of progress, rework required)
- Multiple strategies are available with different cost/risk trade-offs
- The optimal decision is not always obvious - sometimes "cheap" steps need protection if failure risks expensive recovery
The key insight is that local optimization (choosing the best option at each step) doesn't always yield the globally optimal strategy. You need to consider the full path to the goal, including the cost of recovery from failures.
This library provides a general-purpose solver for any such stochastic sequential decision problem.
License
This project is licensed under the MIT License - see the LICENSE file for details.
API Reference
Core Classes
MDP
Represents a finite Markov Decision Process.
Creation Methods:
MDP(...): Direct construction with numpy arraysMDP.from_dict(...): Convenient dictionary-based construction with labels
Key Methods:
is_terminal(state): Check if a state is terminalget_actions(state): Get available actions for a stateget_cost(state, action): Get immediate cost for an actionget_transition_probs(state, action): Get transition probability distribution
MDPResult
Result from solving an MDP.
Attributes:
V: Value function (expected cost-to-go from each state)policy: Optimal action for each stateQ: Q-function (state-action values)iterations: Number of iterations performedconverged: Whether the algorithm convergedresidual: Final maximum value difference
Key Methods:
get_value(state): Get optimal value for a state (supports labels)get_policy(state): Get optimal action for a state (supports labels)get_q_value(state, action): Get Q-value for a state-action pair
solve_mdp_value_iteration
Solve an MDP using value iteration.
solve_mdp_value_iteration(
mdp: MDP,
tol: float = 1e-9,
max_iter: int = 100000,
initial_v: Optional[np.ndarray] = None,
validate: bool = True
) -> MDPResult
Parameters:
mdp: The MDP to solvetol: Convergence tolerance (max absolute value change)max_iter: Maximum iterationsinitial_v: Optional initial value functionvalidate: Whether to validate MDP structure before solving
Returns:
MDPResultwith optimal value function, policy, and convergence info
Command-Line Interface
The CLI provides a simple way to solve MDPs from JSON configuration files without writing Python code.
Basic Usage
# If package is installed (either form works):
python3 -m upo.cli examples/configs/manufacturing_process.json
upo-solve examples/configs/manufacturing_process.json
# If not installed, use PYTHONPATH:
PYTHONPATH=src python3 -m upo.cli examples/configs/manufacturing_process.json
Command-Line Options
-
--tol <float>: Convergence tolerance (default: 1e-9)python3 -m upo.cli problem.json --tol 1e-12
-
--max-iter <int>: Maximum iterations (default: 100000)python3 -m upo.cli problem.json --max-iter 50000
-
--no-validate: Skip MDP validation (faster but potentially unsafe)python3 -m upo.cli problem.json --no-validate
-
--verbose/-v: Show detailed output including Q-valuespython3 -m upo.cli problem.json --verbose
-
--version: Show version informationpython3 -m upo.cli --version
Output
The CLI displays:
- Optimal policy (action to take at each state)
- Expected costs from each state
- Convergence information (iterations, residual)
- Q-values (with
--verboseflag)
Project Structure
upgrade-policy-optimizer/
โโโ src/upo/ # Core library
โ โโโ __init__.py # Package exports
โ โโโ mdp.py # MDP data structures
โ โโโ solver.py # Value iteration solver
โ โโโ result.py # Result container
โ โโโ validate.py # Validation utilities
โ โโโ json_solver.py # JSON-based solver
โ โโโ cli.py # Command-line interface
โโโ tests/ # Test suite
โ โโโ test_mdp.py
โ โโโ test_solver.py
โ โโโ test_validate.py
โ โโโ test_sanity_checks.py
โโโ examples/ # Usage examples
โ โโโ configs/ # Example MDP configurations
โโโ docs/ # Documentation
โโโ pyproject.toml # Package configuration
โโโ README.md # This file
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Quick checklist:
- Add tests for new features
- Run
pytestto verify tests pass - Format code with
black - Check types with
mypy - Update documentation as needed
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file upgrade_policy_optimizer-0.1.2.tar.gz.
File metadata
- Download URL: upgrade_policy_optimizer-0.1.2.tar.gz
- Upload date:
- Size: 121.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
111e969b366c7ffebbad42767983f2a8b35535a95b8d65368424b9a6b6d6ce50
|
|
| MD5 |
bebcfe3df1f67946b27acba4feb8a44c
|
|
| BLAKE2b-256 |
41d49f0916d49076e20953a50b08c875350823f6d8e5f38c2c2c9a67972a2d81
|
File details
Details for the file upgrade_policy_optimizer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: upgrade_policy_optimizer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19f55565bd64204b42c5012a7120901a55f19223455cc88d7d94f13736baa3f2
|
|
| MD5 |
e0117bdca35cbfd16885d9bb93a716a9
|
|
| BLAKE2b-256 |
48400d2d279c7a45a8a41b8772f6ccbead65f0e75a46996c3393f5cbf12c3a5f
|