Skip to main content

Utility for quickly downloading and loading the Tennessee Eastman Process data set

Project description

TEPImport

Utility for quickly downloading and loading the Tennessee Eastman Process data set.

Small Data Set

The data set is downloaded from the University of Illinois Large Scale Systems Research Laboratory. A copy of the license is included in the zip package downloaded from the site in the file readme.txt.

Large Data Set

The data set is downloaded from the Harvard Dataverse site. By downloading this data, you are agreeing to the terms set out in the Harvard Dataverse site.

The terms can be found in the following link. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/6C3JR1

Quickstart

Install the package using pip.

pip3 install tepimport

Import the data sets into your code

import tepimport
tep = tepimport.import_sets()

Output the data sets

for name, train, test in tep:
    print(name)
    print(train)
    print(test)

Downloading the Data

You will be automatically prompted to download the data when trying to import the files into your code. Alternatively, you can run the download module with the following command.

python3 download.py [-h] [--url URL] [--path PATH] [--target TARGET] [--name NAME] [--use-local] [--cleanup] [--no-extract]

optional arguments:
-h, --help       show this help message and exit
--url URL        custom url to download the data from
--path PATH      the path to download the zip file to
--target TARGET  the target path to extract the zip file to
--name NAME      the name of the zip file
--use-local      extract a local copy of the zip file
--cleanup        delete the zip file after extracting it
--no-extract     download the zip without extracting it

Module Functions

The tepimport.py file provides the utilities for import the data sets into your code.

data_exists_check() -> None

    Check that the data set is present in the defined folder path and prompt the user to download the data set if it can't be found.

set_folder_path(path: str) -> None

    Change the path of where to look for the data sets

import_data_set(file_name: str) -> np.ndarray

    Import a data set file as a numpy array

import_sets(sets_to_import: tuple, check_data_exists: bool, skip_training: bool, skip_test: bool) -> list

Takes a sequence of integers from 0-21 and returns a list of tuples
of (set name, training set, test set)

Parameters
----------
sets_to_import: iterable or int
    An iterable object containing integers in the range [0, 21] or a single integer in that range that indicate the data sets to be imported. By default all data sets will be imported.

check_data_exists: bool
    Checks that the data sets exist before attempting to import and prompt the user to download the data sets if they aren't found. Set to True by default.

skip_training: bool
    If true, don't import the training sets. Set to False by default.

skip_test: bool
    If true, don't import the test sets. Set to False by default.

import_tep_sets(lagged_samples: int) -> tuple

Imports the normal operation training set and 4 of the commonly used test sets [IDV(0), IDV(4), IDV(5), and IDV(10)] with only the first 22 measured variables and first 11 manipulated variables. By default, 2 lagged copies are added to the data sets.

add_lagged_samples(data: np.ndarray, lagged_samples: int) -> np.ndarray

Takes a matrix X of [x(1), x(2), ..., x(n)] of n samples where each sample x(i) = [x_1(i), x_2(i), ..., x_m(n)]^T contains m variables and returns a new matrix X* = [x*(1), x*(2), ..., x*(n - d)] of n - d samples where each sample x*(i) = [x_1(i + d), x_2(i + d), ..., x_m(i + d), x_1(i + d - 1), ..., x_2(i + d - 2), ..., x_m(i)] contains m(d + 1) variables where d is the number of lagged samples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tepimport-0.0.3.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tepimport-0.0.3-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file tepimport-0.0.3.tar.gz.

File metadata

  • Download URL: tepimport-0.0.3.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for tepimport-0.0.3.tar.gz
Algorithm Hash digest
SHA256 fc5fa958e93d429bbb85b5f9ac039c1879066d5ae1ae695df8a8fec0c83658d5
MD5 958089065a147d57f8fe36afc7eabc96
BLAKE2b-256 741341f7f7f8f383705baf6f64a1b0d9e86ee7e5b551b4e177914b929be72c9c

See more details on using hashes here.

File details

Details for the file tepimport-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: tepimport-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for tepimport-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 310974ad397d46d6b00c5c4a97062156a09ad7cdac1368069e84dfba921beb9c
MD5 4e674b741ac979176be1e2c1d5e43f86
BLAKE2b-256 90ec0f980140dd12a0dbede805511b658115214933607ae94c1cdc4f1b638b96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page