A Python package for creating, analyzing, and visualizing decision trees with expected value calculations
Project description
DTrees: Decision Tree Analyzer
A Python package for creating, analyzing, and visualizing decision trees.
In case of questions or ideas to improve the library, reach out to me at:
Forks are welcome, but reach out to me to discuss the updates.
Install
To install the library run
!pip install dtrees-analyzer
For more information about versions, changes, etc., visit the project page on PyPI.
Usage
For the decision tree you only need to provide a utility function if one is used:
- utility_function: Utility function, should take a float value as input
For each node you need to add the following parameters:
- node_id: The ID used to connect and identify nodes
- node_name: The name of the node to describe it, shown in the graphs
- value: The value of the node, applies only to terminal nodes
For each edge you need to add the following parameters:
- from_node: The starting node
- to_node: The node it should be connected to
- probability: The probability between two nodes, applies only if:
- Chance node is connected to Chance node
- Chance node is connected to Terminal node
The way to describe a decision tree using the library is by:
- Defining the decision tree
If you want to use a utility function, you should provide it at the decision tree definition.
dt = DecisionTree()
# def any_utility_function(x: float):
# return <Some utility function>
# dt = DecisionTree(utility_function=any_utility_function)
- Adding decision nodes For these nodes we don't know the probabilities for each option as they are decisions we have to make. When calculating the expected values (EV) based on the child nodes, it will always take the node with the highest expected value (or utility value if a function is provided).
dt.add_decision_node("I", "Decision")
- Adding chance nodes These are intermediate nodes which direct us to other nodes based on a certain probability.
dt.add_chance_node("B", "Buy TSLA stocks")
- Adding terminal nodes These nodes are final nodes which are associated with the known expected value we have for each branch.
dt.add_terminal_node("PI", "The price increases", 1_000)
- Adding edges between nodes These are the connections between the nodes, and in the case of chance nodes connected among them or to terminal nodes, they will be associated with probabilities.
# Connecting nodes with probability [Chance node -> Chance node & Chance node -> Terminal node]
dt.add_edge("P", "PE", 0.3)
# Connecting nodes without probability [Decision node -> Any]
dt.add_edge("I", "B")
For each node you have to add the following parameters:
- node_id: The ID used to connect and identify nodes
- node_name: The name of the node to describe it, shown in the graphs
- value: The value of the node, applies only to terminal nodes
Example
# Define decision tree
dt = DecisionTree()
# Build decision tree
dt.add_decision_node("D", "Decision")
dt.add_chance_node("B", "Buy TSLA stocks")
dt.add_terminal_node("NB", "Don't buy TSLA stocks", 0)
dt.add_edge("D", "B")
dt.add_edge("D", "NB")
dt.add_terminal_node("PI", "The price increases", 1_000)
dt.add_terminal_node("PD", "The price decreases", -2_000)
dt.add_edge("B", "PI", 0.6)
dt.add_edge("B", "PD", 0.4)
dt.save_mermaid_graph("./images/example.png")
Comprehensive Example: Land Investment Decision
Newox is considering whether or not to drill on its own land in search of natural gas. If the company decides to drill, the cost is $40,000. If gas is found, Newox has two options: it can either sell the land to West Gas for $200,000 or develop the site itself. If no gas is found, there are no additional costs or revenues beyond the initial drilling cost.
The other option is to skip drilling entirely and sell the land as-is for $22,000.
At current natural gas prices, a producing well would be worth $150,000 on the open market. However, there's a chance gas prices could double, in which case the well would be worth $300,000.
Company engineers estimate a 30% chance of finding gas. Meanwhile, the company's economist believes there's a 60% chance that gas prices will double.
What decision should Newox make to maximize its expected profits?
This example demonstrates a more complex decision tree for a land investment scenario, both with and without a utility function.
Without Utility Function
from dtree import DecisionTree
# Create decision tree
dt = DecisionTree()
# Add nodes
dt.add_decision_node("I", "Decision")
dt.add_terminal_node("S", "Sell land", 22_000)
dt.add_chance_node("D", "Drill land")
dt.add_edge("I", "S")
dt.add_edge("I", "D")
dt.add_decision_node("G", "Gas found")
dt.add_terminal_node("NG", "No gas found", -40_000)
dt.add_edge("D", "G", 0.3)
dt.add_edge("D", "NG", 0.7)
dt.add_terminal_node("GS", "Sell land to West Gas", 200_000-40_000)
dt.add_chance_node("GD", "Develop the site")
dt.add_edge("G", "GD")
dt.add_edge("G", "GS")
dt.add_terminal_node("NM", "Normal market conditions", 150_000-40_000)
dt.add_terminal_node("GM", "Good market conditions", 300_000-40_000)
dt.add_edge("GD", "NM", 0.4)
dt.add_edge("GD", "GM", 0.6)
# Create graph
dt.save_mermaid_graph("./images/case_without_utility_func.png")
With Utility Function
The utility function is:
$u(x) = \sqrt[3]{x}$
import numpy as np
from dtree import DecisionTree
# Utility function
def utility(x):
return np.cbrt(x)
# Create decision tree
dt = DecisionTree(utility_function=utility)
# Add nodes
dt.add_decision_node("I", "Decision")
dt.add_terminal_node("S", "Sell land", 22_000)
dt.add_chance_node("D", "Drill land")
dt.add_edge("I", "S")
dt.add_edge("I", "D")
dt.add_decision_node("G", "Gas found")
dt.add_terminal_node("NG", "No gas found", -40_000)
dt.add_edge("D", "G", 0.3)
dt.add_edge("D", "NG", 0.7)
dt.add_terminal_node("GS", "Sell land to West Gas", 200_000-40_000)
dt.add_chance_node("GD", "Develop the site")
dt.add_edge("G", "GD")
dt.add_edge("G", "GS")
dt.add_terminal_node("NM", "Normal market conditions", 150_000-40_000)
dt.add_terminal_node("GM", "Good market conditions", 300_000-40_000)
dt.add_edge("GD", "NM", 0.4)
dt.add_edge("GD", "GM", 0.6)
To analyze the decision tree you can:
Create a mermaid graph using save_mermaid_graph.
The thicker line shows the optimal path. This is useful to understand, at each decision node, which would be the best path given the quantified information.
dt.save_mermaid_graph("./images/case_with_utility_func.png")
Create a markdown representation of the mermaid graph using save_mermaid_diagram. So, you can customize the graph using services as Mermaid.live.
dt.save_mermaid_diagram("./images/case_with_utility_func.md")
# Output
# graph LR
# classDef decision fill:#4e79a7,stroke:#2c5f85,stroke-width:3px,color:#ffffff,font-weight:bold,font-size:12px
# classDef chance fill:#f28e2c,stroke:#d4751a,stroke-width:3px,color:#ffffff,font-weight:bold,font-size:12px
# classDef terminal fill:#59a14f,stroke:#3f7a37,stroke-width:3px,color:#ffffff,font-weight:bold,font-size:12px
# I["<b>Decision</b><br/>U: 31.75<br/>EV: 32,000.00"]
# class I decision
# S["<b>Sell land</b><br/>U: 28.02<br/>EV: 22,000.00"]
# class S terminal
# D(["<b>Drill land</b><br/>U: 31.75<br/>EV: 32,000.00"])
# class D chance
# G["<b>Gas found</b><br/>U: 58.48<br/>EV: 200,000.00"]
# class G decision
# NG["<b>No gas found</b><br/>U: -34.20<br/>EV: -40,000.00"]
# class NG terminal
# GS["<b>Sell land to West Gas</b><br/>U: 54.29<br/>EV: 160,000.00"]
# class GS terminal
# GD(["<b>Develop the site</b><br/>U: 58.48<br/>EV: 200,000.00"])
# class GD chance
# NM["<b>Normal market conditions</b><br/>U: 47.91<br/>EV: 110,000.00"]
# class NM terminal
# GM["<b>Good market conditions</b><br/>U: 63.83<br/>EV: 260,000.00"]
# class GM terminal
# I ==> S
# I ==> D
# D ==>|<b>30.0%</b>| G
# D ==>|<b>70.0%</b>| NG
# G ==> GD
# G ==> GS
# GD ==>|<b>40.0%</b>| NM
# GD ==>|<b>60.0%</b>| GM
# linkStyle default stroke:#666,stroke-width:2px
# %%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#ffffff', 'primaryTextColor':'#333333', 'primaryBorderColor':'#dddddd', 'lineColor':'#666666'}}}%%
# linkStyle 1 stroke:#e15759,stroke-width:5px;
# linkStyle 2 stroke:#e15759,stroke-width:5px;
# linkStyle 4 stroke:#e15759,stroke-width:5px;
# linkStyle 7 stroke:#e15759,stroke-width:5px;
The calculate_expected_values method allows you to get all the values as a dictionary.
# Expected output
dt.calculate_expected_values()
# Output
# {
# 'I': {'expected_value': 32000.0, 'utility_value': 31.74802103936399},
# 'S': {'expected_value': 22000, 'utility_value': 28.02039330655387},
# 'D': {'expected_value': 32000.0, 'utility_value': 31.74802103936399},
# 'G': {'expected_value': 200000.0, 'utility_value': 58.480354764257314},
# 'NG': {'expected_value': -40000, 'utility_value': -34.19951893353394},
# 'GS': {'expected_value': 160000, 'utility_value': 54.28835233189813},
# 'GD': {'expected_value': 200000.0, 'utility_value': 58.480354764257314},
# 'NM': {'expected_value': 110000, 'utility_value': 47.91419857062784},
# 'GM': {'expected_value': 260000, 'utility_value': 63.82504298859907}
# }
The get_optimal_path allows you to get the optimal path based on a starting node.
dt.get_optimal_path("I")
# Output
# ['I', 'D', 'G', 'GD', 'NM']
The get_children method shows you all the child nodes based on a node ID you provide.
dt.get_children("GD")
# Output
# [
# ('NM', 0.4),
# ('GM', 0.6)
# ]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dtrees_analyzer-0.1.1.tar.gz.
File metadata
- Download URL: dtrees_analyzer-0.1.1.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9233398d803f5160fa819e78c916ecc75ddabc4ec53a07cb16c425c0a92ceb2
|
|
| MD5 |
f1240528a966af863b90d26705a64ffb
|
|
| BLAKE2b-256 |
e480b0aec3a9064a7f48c5f2de329d2a8bbb603286826dd2ed533db21e936e33
|
File details
Details for the file dtrees_analyzer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dtrees_analyzer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3ad2db89814e549df3a1995b25ff4acffc9a8e4dc3d8df3e9eccf8e539b39e0
|
|
| MD5 |
664609003707b37970f177185a0101d2
|
|
| BLAKE2b-256 |
1cd16b4d986d745db1b2d37adc7a7412c521b7457a59976c53a81f1c522b69e6
|