Skip to main content

A Python library for tornado chart generation and analysis

Project description

TornadoPy

A Python library for tornado chart generation and analysis. TornadoPy provides tools for processing Excel-based tornado data and generating professional tornado charts for uncertainty analysis.

Features

  • TornadoProcessor: Process Excel files containing tornado analysis data

    • Parse multi-sheet Excel files with complex headers
    • Extract and compute statistics (p90p10, mean, median, minmax, percentiles)
    • Filter data by properties and dynamic fields
    • Named filter presets for reusable filter combinations
    • Base and reference case extraction with caching
    • Default multiplier support for consistent unit conversion
    • Case selection with weighted criteria
    • Batch processing for multiple parameters
    • Optimized for performance with native numpy operations
    • Comprehensive docstrings and organized code structure
  • tornado_plot: Generate professional tornado charts

    • Customizable colors, fonts, and styling
    • Support for p90/p10 ranges with automatic label placement
    • Reference case lines
    • Custom parameter ordering
    • Export to various image formats
  • distribution_plot: Generate distribution histograms with cumulative curves

    • Beautiful bin sizing with round numbers
    • Cumulative distribution curve showing % of cases above value
    • P90/P50/P10 percentile markers and subtitle
    • Optional reference case line
    • Multiple color schemes available
    • Export to various image formats

Installation

Install from PyPI:

pip install tornadopy

Quick Start

Processing Tornado Data

from tornadopy import TornadoProcessor

# Load Excel file with tornado data
# Optional: Set default multiplier and base case sheet
processor = TornadoProcessor(
    "tornado_data.xlsb",
    multiplier=1e-6,  # Default multiplier for all operations
    base_case="Reference_Case"  # Sheet containing base/reference cases
)

# Get available parameters
parameters = processor.parameters()
print(f"Parameters: {parameters}")

# Get properties for a parameter
properties = processor.properties(parameter="Parameter1")
print(f"Properties: {properties}")

# Compute statistics
result = processor.compute(
    stats="p90p10",
    parameter="Parameter1",
    filters={"property": "npv"},
    multiplier=1e-6  # Convert to millions (or use default if set)
)
print(f"P90/P10: {result['p90p10']}")

Generating Tornado Charts

from tornadopy import TornadoProcessor, tornado_plot

# Get tornado data
processor = TornadoProcessor("tornado_data.xlsb")
tornado_data = processor.get_tornado_data(
    parameters="all",
    filters={"property": "npv"},
    multiplier=1e-6
)

# Convert to sections format for plotting
sections = []
for param, data in tornado_data.items():
    sections.append({
        "parameter": param,
        "minmax": [data["p10"], data["p90"]],
        "p90p10": [data["p10"], data["p90"]]
    })

# Generate tornado chart
fig, ax, saved = tornado_plot(
    sections=sections,
    title="NPV Tornado Chart",
    subtitle="Base case = 100.0 MM USD",
    base=100.0,
    unit="MM USD",
    outfile="tornado_chart.png"
)

Generating Distribution Charts

from tornadopy import TornadoProcessor, distribution_plot

# Get distribution data
processor = TornadoProcessor("tornado_data.xlsb")
distribution = processor.distribution(
    parameter="Parameter1",
    filters={"property": "npv"},
    multiplier=1e-6
)

# Generate distribution chart
fig, ax, saved = distribution_plot(
    distribution,
    title="NPV Distribution",
    unit="MM USD",
    color="blue",
    reference_case=100.0,
    outfile="npv_distribution.png"
)

Advanced Usage

Multi-Zone Analysis with Batch Processing

Process multiple parameters at once with zone filtering and custom options:

from tornadopy import TornadoProcessor, tornado_plot

processor = TornadoProcessor("reservoir_data.xlsb")

# Compute statistics for all parameters with zone filtering
results = processor.compute_batch(
    stats=["minmax", "p90p10"],
    parameters="all",
    filters={
        "zones": ["Zone A - Reservoir", "Zone B - Reservoir"],
        "property": "STOIIP"
    },
    multiplier=1e-3,  # Convert to thousands
    options={
        "p90p10_threshold": 150,  # Minimum cases required
        "skip": ["sources"]  # Skip source tracking for cleaner output
    }
)

# Convert results to tornado plot format
sections = []
for result in results:
    if "p90p10" in result and "errors" not in result:
        p10, p90 = result["p90p10"]
        sections.append({
            "parameter": result["parameter"],
            "minmax": result.get("minmax", [p10, p90]),
            "p90p10": [p10, p90]
        })

# Generate tornado chart
fig, ax, saved = tornado_plot(
    sections,
    title="STOIIP Tornado - Multi-Zone Analysis",
    base=14.5,  # Base case value
    reference_case=14.2,  # Reference case line
    unit="MM m³",
    outfile="stoiip_tornado.svg"
)

Distribution Plot with Custom Gridlines

Create distribution charts with percentile markers and custom grid settings:

from tornadopy import TornadoProcessor, distribution_plot

processor = TornadoProcessor("reservoir_data.xlsb")

# Get distribution data for specific zones
distribution = processor.distribution(
    parameter="Uncertainty_Analysis",
    filters={
        "zones": ["Zone A - Reservoir", "Zone B - Reservoir"],
        "property": "STOIIP"
    },
    multiplier=1e-3  # Convert to thousands
)

# Generate distribution chart with custom settings
fig, ax, saved = distribution_plot(
    data=distribution,
    title="STOIIP Distribution - Uncertainty Analysis",
    unit="MM m³",
    color="blue",
    reference_case=14.5,
    target_bins=20,
    settings={
        "show_percentile_markers": True,  # Show P90/P50/P10 markers
        "marker_size": 8,
        "show_minor_grid": True,
        # Custom gridline intervals
        "x_major_interval": 5,   # Major x-gridlines every 5 units
        "x_minor_interval": 1,   # Minor x-gridlines every 1 unit
        "y_major_interval": 50,  # Major y-gridlines every 50 frequency
        "y_minor_interval": 10,  # Minor y-gridlines every 10 frequency
    },
    outfile="stoiip_distribution.svg"
)

Working with Multiple Properties

Analyze multiple properties simultaneously:

# Compute statistics for multiple properties
result = processor.compute(
    stats=["p90p10", "mean", "median"],
    parameter="Reservoir_Model",
    filters={
        "zones": ["Main_Reservoir"],
        "property": ["STOIIP", "GIIP"]  # Multiple properties
    },
    multiplier=1e-6  # Convert to millions
)

# Access results by property
stoiip_p90, stoiip_p10 = result["p90p10"][0]  # First property (STOIIP)
giip_p90, giip_p10 = result["p90p10"][1]      # Second property (GIIP)

print(f"STOIIP P90/P10: {stoiip_p90:.2f} / {stoiip_p10:.2f} MM m³")
print(f"GIIP P90/P10: {giip_p90:.2f} / {giip_p10:.2f} bcm")

Case Selection with Weighted Criteria

Find specific cases that match target percentiles:

# Find closest cases to p90/p10 with custom weights
result = processor.compute(
    stats="p90p10",
    parameter="Reservoir_Model",
    filters={
        "zones": ["Main_Reservoir"],
        "property": "STOIIP"
    },
    multiplier=1e-6,
    case_selection=True,  # Enable case selection
    selection_criteria={
        "weights": {"STOIIP": 0.6, "GIIP": 0.4}  # Weighted criteria
    }
)

# Access closest cases
for case in result["closest_cases"]:
    print(f"Case {case['case']}: index={case['idx']}, STOIIP={case['STOIIP']:.2f}")
    print(f"  Properties: {case['properties']}")

Skipping Specific Parameters

Exclude certain parameters from batch processing:

# Process all parameters except specific ones
results = processor.compute_batch(
    stats="p90p10",
    parameters="all",
    filters={"property": "STOIIP"},
    multiplier=1e-3,
    options={
        "skip_parameters": ["Reference_Case", "Full_Uncertainty"],  # Skip these
        "skip": ["sources", "errors"]  # Skip these fields in output
    }
)

Custom Tornado Chart Styling

Full control over chart appearance:

# Custom styling for professional reports
settings = {
    "figsize": (12, 8),
    "dpi": 200,
    "pos_dark": "#1E88E5",  # Blue for positive
    "neg_dark": "#D32F2F",  # Red for negative
    "show_values": ["min", "max", "p10", "p90"],
    "show_percentage_diff": True,
}

fig, ax, saved = tornado_plot(
    sections=sections,
    title="Reservoir Volume Sensitivity Analysis",
    subtitle="Base Case: 100 MM m³",
    base=100.0,
    reference_case=95.0,
    unit="MM m³",
    preferred_order=["Porosity", "NTG", "Area"],  # Custom parameter order
    settings=settings,
    outfile="sensitivity_analysis.png"
)

Common Workflows

Complete Reservoir Uncertainty Analysis

End-to-end workflow for reservoir analysis with tornado and distribution charts:

from tornadopy import TornadoProcessor, tornado_plot, distribution_plot
import matplotlib.pyplot as plt

# Load data
processor = TornadoProcessor("reservoir_uncertainty.xlsb")

# Define common filters
zones = ["Main Reservoir - SST1", "Main Reservoir - SST2"]
multiplier = 1e-3  # Convert to thousands

# 1. Generate STOIIP Tornado Chart
stoiip_results = processor.compute_batch(
    stats=["minmax", "p90p10"],
    parameters="all",
    filters={
        "zones": zones,
        "property": "STOIIP"
    },
    multiplier=multiplier,
    options={
        "p90p10_threshold": 150,
        "skip_parameters": ["Reference_Case", "Full_Uncertainty"]
    }
)

# Convert to tornado format
sections = []
for result in stoiip_results:
    if "p90p10" in result and "errors" not in result:
        p10, p90 = result["p90p10"]
        min_val, max_val = result.get("minmax", [p10, p90])
        sections.append({
            "parameter": result["parameter"],
            "minmax": [min_val, max_val],
            "p90p10": [p10, p90]
        })

# Create tornado chart
fig1, ax1, saved1 = tornado_plot(
    sections,
    title="STOIIP Sensitivity Analysis",
    base=14.5,
    reference_case=14.2,
    unit="MM m³",
    outfile="stoiip_tornado.svg"
)

# 2. Generate Distribution Chart
distribution = processor.distribution(
    parameter="Full_Uncertainty",
    filters={
        "zones": zones,
        "property": "STOIIP"
    },
    multiplier=multiplier
)

fig2, ax2, saved2 = distribution_plot(
    data=distribution,
    title="STOIIP Distribution - Full Uncertainty",
    unit="MM m³",
    color="blue",
    reference_case=14.5,
    settings={
        "show_percentile_markers": True,
        "x_major_interval": 5,
        "x_minor_interval": 1,
    },
    outfile="stoiip_distribution.svg"
)

# Show both charts
plt.show()

print(f"Charts saved: {saved1}, {saved2}")

Comparing Multiple Scenarios

Compare different reservoir scenarios side by side:

from tornadopy import TornadoProcessor, distribution_plot
import matplotlib.pyplot as plt
import numpy as np

processor = TornadoProcessor("scenarios.xlsb")

# Define scenarios
scenarios = [
    {"name": "Base Case", "param": "Base_Case", "color": "blue"},
    {"name": "Optimistic", "param": "Optimistic", "color": "green"},
    {"name": "Pessimistic", "param": "Pessimistic", "color": "red"},
]

# Create subplots for comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

for idx, scenario in enumerate(scenarios):
    dist = processor.distribution(
        parameter=scenario["param"],
        filters={"property": "NPV"},
        multiplier=1e-6
    )

    distribution_plot(
        data=dist,
        title=f"{scenario['name']} Scenario",
        unit="MM USD",
        color=scenario["color"],
        target_bins=15,
        outfile=None  # Don't save individual plots
    )

    # Move the plot to the subplot
    plt.close()

plt.tight_layout()
plt.savefig("scenario_comparison.png", dpi=200)
plt.show()

Tips and Best Practices

Filter Management (NEW)

Store and reuse filter presets for consistent analysis:

from tornadopy import TornadoProcessor

processor = TornadoProcessor("reservoir_data.xlsb")

# Store commonly used filter combinations
processor.set_filter('main_zones', {
    'zones': ['Main Reservoir - SST1', 'Main Reservoir - SST2'],
    'property': 'STOIIP'
})

processor.set_filter('north_area', {
    'zones': ['North Zone A', 'North Zone B'],
})

# List all stored filters
print(f"Available filters: {processor.list_filters()}")

# Use stored filters by name (can be string or dict)
result = processor.compute(
    stats="p90p10",
    parameter="Uncertainty_Analysis",
    filters="main_zones",  # Reference filter by name
    multiplier=1e-3
)

# Retrieve filter for inspection
main_zones_filter = processor.get_filter('main_zones')
print(f"Filter contents: {main_zones_filter}")

# Can still use dict filters as before
result = processor.compute(
    stats="mean",
    parameter="Porosity",
    filters={'zones': ['Zone A'], 'property': 'STOIIP'},
    multiplier=1e-3
)

Base and Reference Case Extraction (NEW)

Extract base and reference case values when initializing with a base case sheet:

from tornadopy import TornadoProcessor

# Initialize with base case parameter
processor = TornadoProcessor(
    "reservoir_data.xlsb",
    multiplier=1e-3,
    base_case="Reference_Case"  # Sheet containing base (idx 0) and reference (idx 1)
)

# Access cached base case values (extracted at initialization)
base_values = processor.base_case_values
print(f"Base case STOIIP: {base_values.get('STOIIP', 'N/A')}")

# Access cached reference case values
ref_values = processor.reference_case_values
print(f"Reference case STOIIP: {ref_values.get('STOIIP', 'N/A')}")

# Extract with custom filters and multiplier at runtime
base_case_custom = processor.base_case(
    parameter="Reference_Case",
    filters={'zones': ['Main Reservoir']},
    multiplier=1e-6  # Different multiplier than default
)

ref_case_custom = processor.ref_case(
    parameter="Reference_Case",
    filters={'zones': ['Main Reservoir']},
    multiplier=1e-6
)

# Use in tornado plot
from tornadopy import tornado_plot

base_stoiip = base_values.get('STOIIP', 14.5)
ref_stoiip = ref_values.get('STOIIP', 14.2)

fig, ax, saved = tornado_plot(
    sections=tornado_sections,
    title="STOIIP Tornado Analysis",
    base=base_stoiip,
    reference_case=ref_stoiip,
    unit="MM m³",
    outfile="tornado.png"
)

Working with Filters

Zone Filtering:

# Single zone
filters = {"zones": "Main Reservoir", "property": "STOIIP"}

# Multiple zones (will sum values across zones)
filters = {"zones": ["Zone A", "Zone B"], "property": "STOIIP"}

Property Filtering:

# Single property
filters = {"property": "STOIIP"}

# Multiple properties (returns separate results for each)
filters = {"property": ["STOIIP", "GIIP"]}

Using Multipliers

Convert units easily with the multiplier parameter:

# Convert to thousands (mcm → MM m³)
multiplier = 1e-3

# Convert to millions (m³ → MM m³)
multiplier = 1e-6

# Convert to billions (m³ → bcm)
multiplier = 1e-9

Skipping Parameters

Exclude specific parameters from batch processing:

options = {
    "skip_parameters": ["Reference_Case", "Full_Uncertainty"],  # Skip these parameters
    "skip": ["sources", "errors"]  # Skip these fields in results
}

Handling Errors

results = processor.compute_batch(
    stats="p90p10",
    parameters="all",
    filters={"property": "STOIIP"},
    options={"skip": ["errors"]}  # Hide error messages
)

# Check for errors in results
for result in results:
    if "errors" in result:
        print(f"Parameter {result['parameter']} had errors: {result['errors']}")
    elif "p90p10" in result:
        print(f"Parameter {result['parameter']}: P90/P10 = {result['p90p10']}")

Performance Tips

  1. Use batch processing for multiple parameters:

    # Good: Single call for all parameters
    results = processor.compute_batch(stats="p90p10", parameters="all", ...)
    
    # Avoid: Multiple calls
    for param in parameters:
        result = processor.compute(stats="p90p10", parameter=param, ...)
    
  2. Skip unnecessary data:

    options = {
        "skip": ["sources", "errors"],  # Reduces memory usage
    }
    
  3. Set appropriate thresholds:

    options = {
        "p90p10_threshold": 150,  # Require minimum cases for reliable statistics
    }
    

Excel File Format

TornadoPy expects Excel files with the following structure:

[Info rows - optional metadata]
Header Row 1    | Dynamic Field 1 | Dynamic Field 1 | ...
Header Row 2    | Value A         | Value B         | ...
Case            | Property 1      | Property 2      | ...
1               | 123.45          | 67.89           | ...
2               | 234.56          | 78.90           | ...
...
  • Multiple header rows are supported and will be combined
  • The "Case" row marks the start of data
  • Dynamic fields in column A define metadata columns
  • Property names are extracted from the last header row

API Reference

TornadoProcessor

Initialization

TornadoProcessor(
    filepath: str,
    multiplier: float = 1.0,
    base_case: str = None
)

Parameters:

  • filepath: Path to Excel file (.xlsx, .xlsb, etc.)
  • multiplier: Default multiplier to apply to all operations (default: 1.0)
  • base_case: Name of sheet containing base/reference case data (optional)

Core Methods

Information Access:

  • parameters(): Get list of available parameters (sheet names)
  • properties(parameter=None): Get available properties for a parameter
  • unique(field, parameter=None): Get unique values for a dynamic field
  • info(parameter=None): Get metadata for a parameter
  • case(index, parameter=None): Get data for a specific case

Statistics:

  • compute(stats, parameter=None, filters=None, multiplier=None, options=None, case_selection=False, selection_criteria=None): Compute statistics
  • compute_batch(stats, parameters, filters=None, multiplier=None, options=None, case_selection=False, selection_criteria=None): Batch compute for multiple parameters
  • distribution(parameter=None, filters=None, multiplier=None, options=None): Get distribution data
  • get_tornado_data(parameters, filters=None, multiplier=None, options=None): Get tornado chart formatted data

Filter Management (NEW):

  • set_filter(name, filters): Store a named filter preset
  • get_filter(name): Retrieve a stored filter preset
  • list_filters(): List all stored filter names

Base/Reference Case (NEW):

  • base_case(parameter=None, filters=None, multiplier=None): Extract base case values (index 0)
  • ref_case(parameter=None, filters=None, multiplier=None): Extract reference case values (index 1)
  • base_case_values: Property containing cached base case values (dict)
  • reference_case_values: Property containing cached reference case values (dict)

Legacy Methods (Deprecated but still supported)

For backwards compatibility, the following methods still work but are deprecated:

  • get_parameters() → use parameters()
  • get_properties() → use properties()
  • get_unique() → use unique()
  • get_distribution() → use distribution()
  • get_info() → use info()
  • get_case() → use case()

tornado_plot

Parameters

  • sections: List of section dictionaries with parameter data
  • title: Chart title
  • subtitle: Chart subtitle
  • outfile: Output file path
  • base: Base case value
  • reference_case: Reference case line value
  • unit: Unit label
  • preferred_order: List of parameter names for custom ordering
  • settings: Dictionary of visual settings

Returns

  • fig: Matplotlib figure object
  • ax: Matplotlib axes object
  • saved: Path to saved file (if outfile specified)

distribution_plot

Parameters

  • data: Array-like data (numpy array, list, or from get_distribution)
  • title: Chart title (default "Distribution")
  • unit: Unit label for x-axis and subtitle
  • outfile: Output file path (if specified, saves the figure)
  • target_bins: Target number of bins for histogram (default 20)
  • color: Color scheme - "red", "blue", "green", "orange", "purple", "fuchsia", "yellow"
  • reference_case: Optional reference case value to plot as vertical line
  • settings: Dictionary of visual settings to override defaults

Settings Options

Common settings for customizing distribution plots:

settings = {
    # Layout
    "figsize": (10, 6),
    "dpi": 160,

    # Percentile markers
    "show_percentile_markers": True,  # Show P90/P50/P10 on cumulative curve
    "marker_size": 8,

    # Grid customization
    "show_minor_grid": True,
    "x_major_interval": 5,   # Major x-gridlines every 5 units
    "x_minor_interval": 1,   # Minor x-gridlines every 1 unit
    "y_major_interval": 50,  # Major y-gridlines every 50 frequency
    "y_minor_interval": 10,  # Minor y-gridlines every 10 frequency

    # Font sizes
    "title_fontsize": 15,
    "subtitle_fontsize": 11,
    "label_fontsize": 10,
}

Returns

  • fig: Matplotlib figure object
  • ax: Matplotlib axes object (primary)
  • saved: Path to saved file (if outfile specified)

Requirements

  • Python >= 3.9
  • numpy >= 1.20.0
  • polars >= 0.18.0
  • fastexcel >= 0.9.0
  • matplotlib >= 3.5.0

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please open an issue on GitHub.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tornadopy-0.1.11.tar.gz (38.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tornadopy-0.1.11-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file tornadopy-0.1.11.tar.gz.

File metadata

  • Download URL: tornadopy-0.1.11.tar.gz
  • Upload date:
  • Size: 38.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tornadopy-0.1.11.tar.gz
Algorithm Hash digest
SHA256 61205361fcaa2341560a3dc11e6911d4817f2a857b7af8a9a258e738fe52252c
MD5 d3d79604cfd4087f9e7ee902d4eb7886
BLAKE2b-256 cd9c64719cade632843a2d273339b9b8ea018eaaace220b32fb6bdd1c17a9bee

See more details on using hashes here.

File details

Details for the file tornadopy-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: tornadopy-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tornadopy-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 e5e29993bfc29b135097e562afad85b22d4574ac4c59da33df294fe29f99410e
MD5 f54b5906e52ef917741a3cf41c7cee55
BLAKE2b-256 46705aa70b90081c901baa421f2f7d4f0571ea5d3a2b18b35106ad5d5e5ac07a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page