A Python library for solving finite Markov Decision Processes using value iteration

These details have not been verified by PyPI

Project links

Project description

upgrade-policy-optimizer

upgrade-policy-optimizer is a Python library for solving sequential decision-making problems under uncertainty.

Given states, actions, probabilities, and costs, compute the optimal policy that minimizes expected total cost to reach a goal.

It models problems as finite Markov Decision Processes (MDPs) and solves them using the Bellman optimality equation (value iteration).

🚀 JSON-Based Solver - No Programming Required!

Define your problem in JSON and get the optimal strategy:

python3 -m upo.cli my_problem.json

See docs/JSON_GUIDE.md for format details.

Or from Python:

from upo import solve_from_json
result = solve_from_json("my_problem.json")  # General-purpose format

📖 New to MDPs? Start Here!

👉 Practical Guide: When and How to Use This Tool

This guide explains:

✅ How to know if this tool can solve your problem
✅ What questions it answers (with real examples)
✅ Simple explanations of deployment, manufacturing, and investment examples
✅ How to recognize patterns in your own problems
✅ Common misconceptions and key lessons

Perfect for: Business users, product managers, engineers new to optimization

📊 Ideal for Data-Driven Teams

Have metrics? This tool is perfect for you!

If you track success rates, costs, or failure rates → you're ready to optimize!

✅ Use real data for accurate results
✅ Quantify exact savings (dollars/hours)
✅ Validate predictions against actual outcomes
✅ Make data-backed decisions your team can trust

📚 Documentation

Need	Document
Will this solve my problem?	Practical Guide
Configure my problem (JSON)	JSON Guide
Python API	API Reference (below)
Understand the math	Walkthrough
Implementation details	Implementation Notes
Algorithm verification	Verification Tests

Full index: docs/DOCUMENTATION_INDEX.md

Why This Exists

A lot of real processes are not just "probability calculations".

They are:

Multi-step processes
Have random outcomes (success/failure)
Impose penalties on failure (loss of progress)
Allow paid risk-reduction options (insurance / safety mechanisms / retries)

This project answers:

What should I do at each step to minimize long-run expected cost?

What the Library Does

Given:

A set of discrete states $S$
A target terminal state $T$
Available actions per state $A(s)$
Action costs
Transition probabilities (or success/failure probabilities)
Setback rules (how failures move you backward)

The Library Computes:

$V(s)$: Expected minimum cost to reach the target from state $s$
$\pi(s)$: Optimal action/policy at each state
Comparative "worth it?" metrics across actions

Core Idea: Bellman Optimality

The Bellman optimality principle is:

The optimal decision now is the one that minimizes
immediate cost + expected optimal future cost.

Standard Form (MDP)

$$V(s) = \min_{a} \left[ C(s,a) + \sum_{s'} P(s'|s,a) , V(s') \right]$$

Where:

$V(s)$ = best expected cost starting at state $s$
$C(s,a)$ = immediate cost for action $a$ at state $s$
$P(s'|s,a)$ = probability of transitioning to state $s'$

Simplified Form for Success/Failure Processes

Many real systems (and our default templates) have only two outcomes:

Success
Failure (setback)

So the equation becomes:

$$Q(s,a) = C(s,a) + P(s,a) \cdot V(s_{\text{success}}) + (1 - P(s,a)) \cdot V(s_{\text{fail}})$$

$$V(s) = \min_{a} Q(s,a)$$

This compact equation is powerful: it naturally accounts for deep failure cascades (e.g. repeated failures pushing you far backward).

Outputs and Visualizations

The library can generate:

1) Optimal Policy per State

Which action to take at each step.

2) Expected Cost Curves (Planned)

🚧 Work in Progress: This feature is not yet implemented.

Plots of:

Expected cost using each action
Optimal expected cost curve

3) ROI / Savings Graphs (Planned)

🚧 Work in Progress: This feature is not yet implemented.

Shows how much each risk-reduction option saves vs baseline.

These visuals make it easy to explain why a particular action is optimal.

Installation and Quick Start

Install: pip install upgrade-policy-optimizer (from PyPI).
First run and tutorial: See QUICKSTART.md (canonical install options, run example, first MDP).

Minimal example:

from upo import MDP, solve_mdp_value_iteration

mdp = MDP.from_dict(
    states=["start", "goal"],
    terminal_states=["goal"],
    transitions_dict={"start": {"try": {"goal": 0.7, "start": 0.3}}},
    costs_dict={"start": {"try": 1.0}}
)
result = solve_mdp_value_iteration(mdp)
print(result.get_value("start"), result.get_policy("start"))

Or from JSON: python3 -m upo.cli my_problem.json (see JSON Guide).

Configuration Model (Conceptual)

A configuration defines:

State range / terminal state
Actions (including probability modifiers and/or setback modifiers)
Per-state costs
Base success probabilities

Typical actions represent "risk tools", such as:

Increasing success chance
Preventing setback on failure
Reducing penalty severity

Potential Applications

This framework can model any multi-step process with probabilistic outcomes and setback penalties:

Software & DevOps

CI/CD pipelines with rollback costs on failed deployments
Data migration with partial completion and rollback penalties
Distributed transaction retries with backoff strategies

Manufacturing & Operations

Quality control loops where failed items require rework
Multi-stage assembly where defects force partial disassembly
Equipment calibration with precision levels and recalibration costs

Finance & Risk Management

Investment strategies with transaction costs and stop-loss triggers
Insurance optimization (when to buy coverage vs. accept risk)
Resource allocation under uncertainty with reallocation penalties

Each is the same underlying math: an MDP solved via Bellman optimality.

👉 For detailed examples with real-world scenarios and explanations, see PRACTICAL_GUIDE.md

Project Origin

This project was inspired by sequential decision-making problems where:

Each step has a cost and success probability
Failures can set you back (loss of progress, rework required)
Multiple strategies are available with different cost/risk trade-offs
The optimal decision is not always obvious - sometimes "cheap" steps need protection if failure risks expensive recovery

The key insight is that local optimization (choosing the best option at each step) doesn't always yield the globally optimal strategy. You need to consider the full path to the goal, including the cost of recovery from failures.

This library provides a general-purpose solver for any such stochastic sequential decision problem.

License

This project is licensed under the MIT License - see the LICENSE file for details.

API Reference

Core Classes

`MDP`

Represents a finite Markov Decision Process.

Creation Methods:

MDP(...): Direct construction with numpy arrays
MDP.from_dict(...): Convenient dictionary-based construction with labels

Key Methods:

is_terminal(state): Check if a state is terminal
get_actions(state): Get available actions for a state
get_cost(state, action): Get immediate cost for an action
get_transition_probs(state, action): Get transition probability distribution

`MDPResult`

Result from solving an MDP.

Attributes:

V: Value function (expected cost-to-go from each state)
policy: Optimal action for each state
Q: Q-function (state-action values)
iterations: Number of iterations performed
converged: Whether the algorithm converged
residual: Final maximum value difference

Key Methods:

get_value(state): Get optimal value for a state (supports labels)
get_policy(state): Get optimal action for a state (supports labels)
get_q_value(state, action): Get Q-value for a state-action pair

`solve_mdp_value_iteration`

Solve an MDP using value iteration.

solve_mdp_value_iteration(
    mdp: MDP,
    tol: float = 1e-9,
    max_iter: int = 100000,
    initial_v: Optional[np.ndarray] = None,
    validate: bool = True
) -> MDPResult

Parameters:

mdp: The MDP to solve
tol: Convergence tolerance (max absolute value change)
max_iter: Maximum iterations
initial_v: Optional initial value function
validate: Whether to validate MDP structure before solving

Returns:

MDPResult with optimal value function, policy, and convergence info

Command-Line Interface

The CLI provides a simple way to solve MDPs from JSON configuration files without writing Python code.

Basic Usage

# If package is installed (either form works):
python3 -m upo.cli examples/configs/manufacturing_process.json
upo-solve examples/configs/manufacturing_process.json

# If not installed, use PYTHONPATH:
PYTHONPATH=src python3 -m upo.cli examples/configs/manufacturing_process.json

Command-Line Options

--tol <float>: Convergence tolerance (default: 1e-9)
```
python3 -m upo.cli problem.json --tol 1e-12
```

--max-iter <int>: Maximum iterations (default: 100000)

python3 -m upo.cli problem.json --max-iter 50000

--no-validate: Skip MDP validation (faster but potentially unsafe)
```
python3 -m upo.cli problem.json --no-validate
```
--verbose / -v: Show detailed output including Q-values
```
python3 -m upo.cli problem.json --verbose
```
--version: Show version information
```
python3 -m upo.cli --version
```

Output

The CLI displays:

Optimal policy (action to take at each state)
Expected costs from each state
Convergence information (iterations, residual)
Q-values (with --verbose flag)

Project Structure

upgrade-policy-optimizer/
├── src/upo/              # Core library
│   ├── __init__.py      # Package exports
│   ├── mdp.py           # MDP data structures
│   ├── solver.py        # Value iteration solver
│   ├── result.py        # Result container
│   ├── validate.py      # Validation utilities
│   ├── json_solver.py   # JSON-based solver
│   └── cli.py           # Command-line interface
├── tests/               # Test suite
│   ├── test_mdp.py
│   ├── test_solver.py
│   ├── test_validate.py
│   └── test_sanity_checks.py
├── examples/            # Usage examples
│   └── configs/         # Example MDP configurations
├── docs/                # Documentation
├── pyproject.toml       # Package configuration
└── README.md            # This file

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Quick checklist:

Add tests for new features
Run pytest to verify tests pass
Format code with black
Check types with mypy
Update documentation as needed

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Feb 1, 2026

0.1.1

Jan 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upgrade_policy_optimizer-0.1.2.tar.gz (121.3 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

upgrade_policy_optimizer-0.1.2-py3-none-any.whl (19.7 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file upgrade_policy_optimizer-0.1.2.tar.gz.

File metadata

Download URL: upgrade_policy_optimizer-0.1.2.tar.gz
Upload date: Feb 1, 2026
Size: 121.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for upgrade_policy_optimizer-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`111e969b366c7ffebbad42767983f2a8b35535a95b8d65368424b9a6b6d6ce50`
MD5	`bebcfe3df1f67946b27acba4feb8a44c`
BLAKE2b-256	`41d49f0916d49076e20953a50b08c875350823f6d8e5f38c2c2c9a67972a2d81`

See more details on using hashes here.

File details

Details for the file upgrade_policy_optimizer-0.1.2-py3-none-any.whl.

File metadata

Download URL: upgrade_policy_optimizer-0.1.2-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for upgrade_policy_optimizer-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19f55565bd64204b42c5012a7120901a55f19223455cc88d7d94f13736baa3f2`
MD5	`e0117bdca35cbfd16885d9bb93a716a9`
BLAKE2b-256	`48400d2d279c7a45a8a41b8772f6ccbead65f0e75a46996c3393f5cbf12c3a5f`

See more details on using hashes here.

upgrade-policy-optimizer 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

upgrade-policy-optimizer

🚀 JSON-Based Solver - No Programming Required!

📖 New to MDPs? Start Here!

📊 Ideal for Data-Driven Teams

📚 Documentation

Why This Exists

What the Library Does

Given:

The Library Computes:

Core Idea: Bellman Optimality

Standard Form (MDP)

Simplified Form for Success/Failure Processes

Outputs and Visualizations

1) Optimal Policy per State

2) Expected Cost Curves (Planned)

3) ROI / Savings Graphs (Planned)

Installation and Quick Start

Configuration Model (Conceptual)

Potential Applications

Software & DevOps

Manufacturing & Operations

Finance & Risk Management

Project Origin

License

API Reference

Core Classes

MDP

MDPResult

solve_mdp_value_iteration

Command-Line Interface

Basic Usage

Command-Line Options

Output

Project Structure

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`MDP`

`MDPResult`

`solve_mdp_value_iteration`