Skip to main content

A package for selecting ensemble members using entropy theory

Project description

En-EMS | Entropy-based Ensemble Members Selection

en-ems is a Python library for the selection of a set of mutually exclusive, collectivelly exaustive (MECE) ensemble members.

The library implements the approach presented by Darbandsari and Coulibaly (2020) as step that antecedes the further merging of a set of ensemble forecasts.

The en-ems package is built over the pyitlib package, which implements fundamental information theory methods.

Installing

The library can be installed using the traditional pip:

pip install en-ems

And is listed on the Python Package Index (pypi) as en-ems.

Using

Suppose you have a file named example.csv with the following content:

Date,       Memb_A, Memb_B, ...,  Memb_Z, Obsv
2020/05/15, 1.12,   1.05,   ...,  0.5,    1.01
2020/05/16, 1.15,   1.12,   ...,  0.9,    1.10
2020/05/17, 1.13,   1.32,   ...,  1.1,    1.29
...         ...     ...     ...,  ...,    ...
2020/11/30, 1.22,   0.95,   ...,  0.3,    0.87

In which the columns starting with "Memb_" hold the realization of one ensemble member for the time interval and "Obsv" holds the observed values for the same time interval.

If your our objective is to select a MECE set considering obaservations, it can be done using the standard parameters by:

import pandas as pd
import enems

# read file
data_ensemble = pd.read_csv("example.csv").to_dict('list')
data_obsv = data_ensemble["Obsv"]
del data_ensemble["Obsv"], data_ensemble["Date"]

# perform selection
selection_log = enems.select_ensemble_members(data_ensemble, data_obsv)

The variable selection_log will be a dictionary containing a log of the total correlation, joint antropy and (if an observation was given) the transinformation of the given and selected datasets. It also contains, as expected, the ids of the selected ensemble members.

Example

Mock data for a dataset with 75 supposed ensemble members and without observation records can be obtained with the function enems.load_data_75().

Here is a full example on how we can access the mock data, select a MECE subset and visualize the results using the popular matplotlib is given:

import matplotlib.pyplot as plt
import enems

if __name__ == "__main__":

    # ## LOAD DATA ################################################################################################### #

    test_data_df = enems.load_data_75()
    test_data = test_data_df.to_dict("list")

    # ## SELECT MECE SUBSET ########################################################################################## #

    selection_log = enems.select_ensemble_members(test_data, None, n_bins=10, bin_by="equal_intervals", 
                                                  beta_threshold=0.95, n_processes=1, verbose=False)

    # ## PLOT FUNCTIONS ############################################################################################## #

    def plot_ensemble_members(all_series: dict, selected_series: set, plot_title: str, output_file_path: str) -> None:
        _, axs = plt.subplots(1, 1, figsize=(7, 2.5))
        axs.set_xlabel("Time")
        axs.set_ylabel("Value")
        axs.set_title(plot_title)
        axs.set_xlim(0, 143)
        axs.set_ylim(0, 5)
        [axs.plot(all_series[series_id], color="#999999", zorder=3, alpha=0.33) for series_id in selected_series]
        plt.tight_layout()
        plt.savefig(output_file_path)
        plt.close()
        return None

    def plot_log(n_total_members: int, log: dict, output_file_path: str) -> None:
        _, axss = plt.subplots(1, 2, figsize=(7.0, 2.5))
        x_values=[n_total_members-i-1 for i in range(len(log["history"]["total_correlation"]))]
        axss[0].set_xlabel("Time")
        axss[0].set_ylabel("Total correlation")
        axss[0].plot(x_values, log["history"]["total_correlation"], color="#7777FF", zorder=3)
        axss[0].set_ylim(70, 140)
        axss[0].set_xlim(x_values[0], x_values[-1])
        axss[1].set_xlabel("Time")
        axss[1].set_ylabel("Joint entropy")
        axss[1].axhline(log["original_ensemble_joint_entropy"], color="#FF7777", zorder=3, label="Full set")
        axss[1].plot(x_values, log["history"]["joint_entropy"], color="#7777FF", zorder=3, label="Selected set")
        axss[1].set_ylim(6.3, 6.9)
        axss[1].set_xlim(x_values[0], x_values[-1])
        axss[1].legend()
        plt.tight_layout()
        plt.savefig(output_file_path)
        plt.close()
        return None

    # ## FUNCTIONS CALL ############################################################################################## #

    plot_log(len(test_data.keys()), selection_log, "test/log.svg")

    plot_ensemble_members(test_data, set(test_data.keys()),
                          "All members (%d)" % len(test_data.keys()),
                          "test/ensemble_all.svg")

    plot_ensemble_members(test_data, selection_log["selected_members"],
                          "Selected members (%d)" % len(selection_log["selected_members"]),
                          "test/ensemble_selected.svg")

Which would give us the following plot:

log.svg

ensemble_all.svg

ensemble_selected.svg

Further documentation

Further information about the library can be found in the docs folder of the Git repository of this project.

The users are can find the complete theoretical explanation and assessment of the method in the original work of Darbandsari and Coulibaly (2020).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

en_ems-0.2.1-py3-none-any.whl (51.8 kB view details)

Uploaded Python 3

File details

Details for the file en_ems-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: en_ems-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 51.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.3

File hashes

Hashes for en_ems-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b523f0cb447d5f3993ab42c47170c85f471f5cbc076db08ebc1892ec95ba714
MD5 76f6429aaa93e741dd5a957f74a1662c
BLAKE2b-256 5d536f87dc2d073714c41bded6a2de1c5c78718b60d81214e4269aa3367083da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page