Skip to main content

A Python package for creating, analyzing, and visualizing decision trees with expected value calculations

Project description

Logo

DTrees: Decision Tree Analyzer

A Python package for creating, analyzing, and visualizing decision trees.

In case of questions or ideas to improve the library, reach out to me at:

Forks are welcome, but reach out to me to discuss the updates.

Install

To install the library run

!pip install dtrees-analyzer

For more information about versions, changes, etc., visit the project page on PyPI.

Usage

For the decision tree you only need to provide a utility function if one is used:

  • utility_function: Utility function, should take a float value as input

For each node you need to add the following parameters:

  • node_id: The ID used to connect and identify nodes
  • node_name: The name of the node to describe it, shown in the graphs
  • value: The value of the node, applies only to terminal nodes

For each edge you need to add the following parameters:

  • from_node: The starting node
  • to_node: The node it should be connected to
  • probability: The probability between two nodes, applies only if:
    • Chance node is connected to Chance node
    • Chance node is connected to Terminal node

The way to describe a decision tree using the library is by:

  1. Defining the decision tree
    If you want to use a utility function, you should provide it at the decision tree definition.
dt = DecisionTree()

# def any_utility_function(x: float):
#     return <Some utility function>
# dt = DecisionTree(utility_function=any_utility_function)
  1. Adding decision nodes For these nodes we don't know the probabilities for each option as they are decisions we have to make. When calculating the expected values (EV) based on the child nodes, it will always take the node with the highest expected value (or utility value if a function is provided).
dt.add_decision_node("I", "Decision")
  1. Adding chance nodes These are intermediate nodes which direct us to other nodes based on a certain probability.
dt.add_chance_node("B", "Buy TSLA stocks")
  1. Adding terminal nodes These nodes are final nodes which are associated with the known expected value we have for each branch.
dt.add_terminal_node("PI", "The price increases", 1_000)
  1. Adding edges between nodes These are the connections between the nodes, and in the case of chance nodes connected among them or to terminal nodes, they will be associated with probabilities.
# Connecting nodes with probability [Chance node -> Chance node & Chance node -> Terminal node]
dt.add_edge("P", "PE", 0.3)

# Connecting nodes without probability [Decision node -> Any]
dt.add_edge("I", "B")

For each node you have to add the following parameters:

  • node_id: The ID used to connect and identify nodes
  • node_name: The name of the node to describe it, shown in the graphs
  • value: The value of the node, applies only to terminal nodes

Example

# Define decision tree
dt = DecisionTree()

# Build decision tree
dt.add_decision_node("D", "Decision")
dt.add_chance_node("B", "Buy TSLA stocks")
dt.add_terminal_node("NB", "Don't buy TSLA stocks", 0)
dt.add_edge("D", "B")
dt.add_edge("D", "NB")

dt.add_terminal_node("PI", "The price increases", 1_000)
dt.add_terminal_node("PD", "The price decreases", -2_000)
dt.add_edge("B", "PI", 0.6)
dt.add_edge("B", "PD", 0.4)

dt.save_mermaid_graph("./images/example.png")

Example

Comprehensive Example: Land Investment Decision

Newox is considering whether or not to drill on its own land in search of natural gas. If the company decides to drill, the cost is $40,000. If gas is found, Newox has two options: it can either sell the land to West Gas for $200,000 or develop the site itself. If no gas is found, there are no additional costs or revenues beyond the initial drilling cost.

The other option is to skip drilling entirely and sell the land as-is for $22,000.

At current natural gas prices, a producing well would be worth $150,000 on the open market. However, there's a chance gas prices could double, in which case the well would be worth $300,000.

Company engineers estimate a 30% chance of finding gas. Meanwhile, the company's economist believes there's a 60% chance that gas prices will double.

What decision should Newox make to maximize its expected profits?

This example demonstrates a more complex decision tree for a land investment scenario, both with and without a utility function.

Without Utility Function

from dtree import DecisionTree

# Create decision tree
dt = DecisionTree()

# Add nodes
dt.add_decision_node("I", "Decision")
dt.add_terminal_node("S", "Sell land", 22_000)
dt.add_chance_node("D", "Drill land")
dt.add_edge("I", "S")
dt.add_edge("I", "D")

dt.add_decision_node("G", "Gas found")
dt.add_terminal_node("NG", "No gas found", -40_000)
dt.add_edge("D", "G", 0.3)
dt.add_edge("D", "NG", 0.7)

dt.add_terminal_node("GS", "Sell land to West Gas", 200_000-40_000)
dt.add_chance_node("GD", "Develop the site")
dt.add_edge("G", "GD")
dt.add_edge("G", "GS")

dt.add_terminal_node("NM", "Normal market conditions", 150_000-40_000)
dt.add_terminal_node("GM", "Good market conditions", 300_000-40_000)
dt.add_edge("GD", "NM", 0.4)
dt.add_edge("GD", "GM", 0.6)

# Create graph
dt.save_mermaid_graph("./images/case_without_utility_func.png")

Case without utility function

With Utility Function

The utility function is:

$u(x) = \sqrt[3]{x}$

import numpy as np
from dtree import DecisionTree

# Utility function
def utility(x):
    return np.cbrt(x)

# Create decision tree
dt = DecisionTree(utility_function=utility)

# Add nodes
dt.add_decision_node("I", "Decision")
dt.add_terminal_node("S", "Sell land", 22_000)
dt.add_chance_node("D", "Drill land")
dt.add_edge("I", "S")
dt.add_edge("I", "D")

dt.add_decision_node("G", "Gas found")
dt.add_terminal_node("NG", "No gas found", -40_000)
dt.add_edge("D", "G", 0.3)
dt.add_edge("D", "NG", 0.7)

dt.add_terminal_node("GS", "Sell land to West Gas", 200_000-40_000)
dt.add_chance_node("GD", "Develop the site")
dt.add_edge("G", "GD")
dt.add_edge("G", "GS")

dt.add_terminal_node("NM", "Normal market conditions", 150_000-40_000)
dt.add_terminal_node("GM", "Good market conditions", 300_000-40_000)
dt.add_edge("GD", "NM", 0.4)
dt.add_edge("GD", "GM", 0.6)

To analyze the decision tree you can:

Create a mermaid graph using save_mermaid_graph.
The thicker line shows the optimal path. This is useful to understand, at each decision node, which would be the best path given the quantified information.

dt.save_mermaid_graph("./images/case_with_utility_func.png")

Case with utility function

Create a markdown representation of the mermaid graph using save_mermaid_diagram. So, you can customize the graph using services as Mermaid.live.

dt.save_mermaid_diagram("./images/case_with_utility_func.md")

# Output
# graph LR
#     classDef decision fill:#4e79a7,stroke:#2c5f85,stroke-width:3px,color:#ffffff,font-weight:bold,font-size:12px
#     classDef chance fill:#f28e2c,stroke:#d4751a,stroke-width:3px,color:#ffffff,font-weight:bold,font-size:12px
#     classDef terminal fill:#59a14f,stroke:#3f7a37,stroke-width:3px,color:#ffffff,font-weight:bold,font-size:12px
#     I["<b>Decision</b><br/>U: 31.75<br/>EV: 32,000.00"]
#     class I decision
#     S["<b>Sell land</b><br/>U: 28.02<br/>EV: 22,000.00"]
#     class S terminal
#     D(["<b>Drill land</b><br/>U: 31.75<br/>EV: 32,000.00"])
#     class D chance
#     G["<b>Gas found</b><br/>U: 58.48<br/>EV: 200,000.00"]
#     class G decision
#     NG["<b>No gas found</b><br/>U: -34.20<br/>EV: -40,000.00"]
#     class NG terminal
#     GS["<b>Sell land to West Gas</b><br/>U: 54.29<br/>EV: 160,000.00"]
#     class GS terminal
#     GD(["<b>Develop the site</b><br/>U: 58.48<br/>EV: 200,000.00"])
#     class GD chance
#     NM["<b>Normal market conditions</b><br/>U: 47.91<br/>EV: 110,000.00"]
#     class NM terminal
#     GM["<b>Good market conditions</b><br/>U: 63.83<br/>EV: 260,000.00"]
#     class GM terminal
#     I ==> S
#     I ==> D
#     D ==>|<b>30.0%</b>| G
#     D ==>|<b>70.0%</b>| NG
#     G ==> GD
#     G ==> GS
#     GD ==>|<b>40.0%</b>| NM
#     GD ==>|<b>60.0%</b>| GM
#     linkStyle default stroke:#666,stroke-width:2px
#     %%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#ffffff', 'primaryTextColor':'#333333', 'primaryBorderColor':'#dddddd', 'lineColor':'#666666'}}}%%
#     linkStyle 1 stroke:#e15759,stroke-width:5px;
#     linkStyle 2 stroke:#e15759,stroke-width:5px;
#     linkStyle 4 stroke:#e15759,stroke-width:5px;
#     linkStyle 7 stroke:#e15759,stroke-width:5px;

The calculate_expected_values method allows you to get all the values as a dictionary.

# Expected output
dt.calculate_expected_values()

# Output
# {
#     'I': {'expected_value': 32000.0, 'utility_value': 31.74802103936399},
#     'S': {'expected_value': 22000, 'utility_value': 28.02039330655387},
#     'D': {'expected_value': 32000.0, 'utility_value': 31.74802103936399},
#     'G': {'expected_value': 200000.0, 'utility_value': 58.480354764257314},
#     'NG': {'expected_value': -40000, 'utility_value': -34.19951893353394},
#     'GS': {'expected_value': 160000, 'utility_value': 54.28835233189813},
#     'GD': {'expected_value': 200000.0, 'utility_value': 58.480354764257314},
#     'NM': {'expected_value': 110000, 'utility_value': 47.91419857062784},
#     'GM': {'expected_value': 260000, 'utility_value': 63.82504298859907}
# }

The get_optimal_path allows you to get the optimal path based on a starting node.

dt.get_optimal_path("I")

# Output
# ['I', 'D', 'G', 'GD', 'NM']

The get_children method shows you all the child nodes based on a node ID you provide.

dt.get_children("GD")

# Output
# [
#     ('NM', 0.4), 
#     ('GM', 0.6)
# ]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtrees_analyzer-0.1.2.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dtrees_analyzer-0.1.2-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file dtrees_analyzer-0.1.2.tar.gz.

File metadata

  • Download URL: dtrees_analyzer-0.1.2.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for dtrees_analyzer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 399017bc4e1ab62d4694bc4b6734edf5e3ff110858b16d9fcdaadc40db9b5b78
MD5 52438861b617753c9441bb638c66e680
BLAKE2b-256 88cd7c3fe8755c4a8150cb0e56e6a7306b6bb395da0ec186d52c473697b8da42

See more details on using hashes here.

File details

Details for the file dtrees_analyzer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dtrees_analyzer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 30f5ec857df4e83c4bafa4b0acd1cc865812090e57e9642e82d62e0c081fc3f7
MD5 ca8535feea9c8930946f2ed21e669487
BLAKE2b-256 1e03acf11b29d7d097a19326e024156fdac6a9a30124cfaba8fbd72ec0dcd44e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page