Skip to main content

Measurements of electric power consumption in one household

Project description

                   THIS IS A PACKAGE For creating built_in dataset to import it directly by importing it                 
     this package helps any developer to import built_in, preprocessing dataset and pass it to models to use it whether 
     multi_variate or uni_variate timeseries forcating, this dataset about Measurements of electric power consumption in
     one household with a one-minute sampling rate over a period of almost 4 years.Different electrical quantities and 
     some sub-metering values are available. 

Installation

pip install EnergyData

electricpower_package

  • EnergyData
    • init.py
    • electricpower.py
    • data * householdpower.csv
  • test
    • init.py
    • test.py
  • MANIFEST.md
  • DESCREIPTION.rst
  • setup.py
  • tox.ini
  • README.md

how to use the package

import EnergyData as ed

Then we

# import load_data to get the built-in, preprocessing data by this code:
X_train,X_test,Y_train,Y_test = ed.load_data()

write functions that we will import in load_data() like:

1- train_test_split() : take data and return train_data, test_data

def train_test_split(data_frame, test_size=0.3):
        """
        :param data_frame: The whole dataframe needed to split the data
        :param test_size:  setting the size of test set , initially equals 30%
        :return: two sets after splitting the data , one for training and the other for testing
        """

        train_size = 1 - test_size
        end_idx = int(data_frame.shape[0] * train_size * 100 // 100)

        train = data_frame.iloc[:end_idx, :]
        test = data_frame.iloc[end_idx:, :]

        return train, test

2- scale_data() : take train_data, test_data and perform scaling on them

def scale_data(train, test):
    scaler = MinMaxScaler().fit(train)
    return scaler.transform(train), scaler.transform(test), scaler

3- univariate_splitter() : take data and return arrays of input_feature and output_feature


def univariate_splitter(data_frame):
        """
        :param df:
        :return: two arrays one for features and the other for output
        """

        input_features = []
        ouput_feature = []

        len_df = data_frame.shape[0]

        for i in range(len_df):

            end_idx = i + 1

            if end_idx > len_df - 1:
                break

            input_x, output_y = data_frame[i:end_idx, 1:], data_frame[end_idx: end_idx + 1, 0]

            input_features.append(input_x)
            ouput_feature.append(output_y)

        return np.array(input_features), np.mean(np.array(ouput_feature), axis=1)

4- multivariate_splitter() : take data and return arrays of input_feature and output_feature:

def multivariate_splitter(df, input_size=21, output_size=7):
        """
        :param df:
        :param input_size: how many samples added to each input
        :param output_size: how many values will be predicted from each output
        :return: two arrays one for features and the other for output
        """

        input_features = []
        ouput_feature = []

        len_df = df.shape[0]

        for i in range(len_df):

            end_idx = i + input_size

            if end_idx > len_df - output_size:
                break

            input_x, output_y = df[i:end_idx, 1:], df[end_idx: end_idx + output_size, 0]

            input_features.append(input_x)
            ouput_feature.append(output_y)

        return np.array(input_features), np.array(ouput_feature)

first we read data by pkg_resources then we import all of these functions to load_data(), so once we import it we get data splitted,scalled and converted:

NOTE:the name variable stores the module name

def load_data():
    stream= pkg_resources.resource_stream(__name__, r'data\householdpower.csv')
    data_fram=pd.read_csv(stream,encoding='latin-1',parse_dates=['date_time'], index_col= 'date_time')
    data_fram['sub_metering_remaining'] = (data_fram.Global_active_power * 1000  / 60 ) - (data_fram.Sub_metering_1 + data_fram.Sub_metering_2 + data_fram.Sub_metering_3)
    data_fram = data_fram.resample('D').sum()
    data_fram = data_fram.resample('D').mean()
    X_train, X_test = train_test_split(data_frame=data_fram)
    X_train, X_test, scaler = scale_data(train=X_train, test=X_test)
    choosing=input('UNivariate or Multivariate (U or M)?')
    if choosing=='U':
        X_train, Y_train =univariate_splitter(X_train)
        X_test, Y_test = univariate_splitter(X_test)
    if choosing=="M":
        X_train, Y_train =multivariate_splitter(X_train)
        X_test, Y_test = multivariate_splitter(X_test)
    return X_train,X_test,Y_train,Y_test
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

EnergyData-0.0.2.tar.gz (18.8 MB view details)

Uploaded Source

Built Distribution

EnergyData-0.0.2-py3-none-any.whl (19.7 MB view details)

Uploaded Python 3

File details

Details for the file EnergyData-0.0.2.tar.gz.

File metadata

  • Download URL: EnergyData-0.0.2.tar.gz
  • Upload date:
  • Size: 18.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for EnergyData-0.0.2.tar.gz
Algorithm Hash digest
SHA256 589d38506ac812819384db9196603d8a1825867caa0aa39780315e7f4596c2d4
MD5 fc74f7cc8b330a8e3737b50742cc240e
BLAKE2b-256 f350dcac28bab5ac97a67d046d5c1af225e4262610f1a98720fb4e4a34ea8a47

See more details on using hashes here.

File details

Details for the file EnergyData-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: EnergyData-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for EnergyData-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7ab69ce49b4a64a48cee27d386d6b08d0f83b83f20be5b43e1900fba9a6f378d
MD5 d9a59139a99bb560e9b4a00749e9d4e7
BLAKE2b-256 3a2aaaf4a95c9690abfd89d57caa609c0c9dae9e17d35cdc58e5f4c649ee3c17

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page