Measurements of electric power consumption in one household
Project description
THIS IS A PACKAGE For creating built_in dataset to import it directly by importing it
this package helps any developer to import built_in, preprocessing dataset and pass it to models to use it whether
multi_variate or uni_variate timeseries forcating, this dataset about Measurements of electric power consumption in
one household with a one-minute sampling rate over a period of almost 4 years.Different electrical quantities and
some sub-metering values are available.
Installation
pip install EnergyData
electricpower_package
- EnergyData
- init.py
- electricpower.py
- data * householdpower.csv
- test
- init.py
- test.py
- MANIFEST.md
- DESCREIPTION.rst
- setup.py
- tox.ini
- README.md
how to use the package
import EnergyData as ed
Then we
# import load_data to get the built-in, preprocessing data by this code:
X_train,X_test,Y_train,Y_test = ed.load_data()
write functions that we will import in load_data() like:
1- train_test_split() : take data and return train_data, test_data
def train_test_split(data_frame, test_size=0.3):
"""
:param data_frame: The whole dataframe needed to split the data
:param test_size: setting the size of test set , initially equals 30%
:return: two sets after splitting the data , one for training and the other for testing
"""
train_size = 1 - test_size
end_idx = int(data_frame.shape[0] * train_size * 100 // 100)
train = data_frame.iloc[:end_idx, :]
test = data_frame.iloc[end_idx:, :]
return train, test
2- scale_data() : take train_data, test_data and perform scaling on them
def scale_data(train, test):
scaler = MinMaxScaler().fit(train)
return scaler.transform(train), scaler.transform(test), scaler
3- univariate_splitter() : take data and return arrays of input_feature and output_feature
def univariate_splitter(data_frame):
"""
:param df:
:return: two arrays one for features and the other for output
"""
input_features = []
ouput_feature = []
len_df = data_frame.shape[0]
for i in range(len_df):
end_idx = i + 1
if end_idx > len_df - 1:
break
input_x, output_y = data_frame[i:end_idx, 1:], data_frame[end_idx: end_idx + 1, 0]
input_features.append(input_x)
ouput_feature.append(output_y)
return np.array(input_features), np.mean(np.array(ouput_feature), axis=1)
4- multivariate_splitter() : take data and return arrays of input_feature and output_feature:
def multivariate_splitter(df, input_size=21, output_size=7):
"""
:param df:
:param input_size: how many samples added to each input
:param output_size: how many values will be predicted from each output
:return: two arrays one for features and the other for output
"""
input_features = []
ouput_feature = []
len_df = df.shape[0]
for i in range(len_df):
end_idx = i + input_size
if end_idx > len_df - output_size:
break
input_x, output_y = df[i:end_idx, 1:], df[end_idx: end_idx + output_size, 0]
input_features.append(input_x)
ouput_feature.append(output_y)
return np.array(input_features), np.array(ouput_feature)
first we read data by pkg_resources then we import all of these functions to load_data(), so once we import it we get data splitted,scalled and converted:
NOTE:the name variable stores the module name
def load_data():
stream= pkg_resources.resource_stream(__name__, r'data\householdpower.csv')
data_fram=pd.read_csv(stream,encoding='latin-1',parse_dates=['date_time'], index_col= 'date_time')
data_fram['sub_metering_remaining'] = (data_fram.Global_active_power * 1000 / 60 ) - (data_fram.Sub_metering_1 + data_fram.Sub_metering_2 + data_fram.Sub_metering_3)
data_fram = data_fram.resample('D').sum()
data_fram = data_fram.resample('D').mean()
X_train, X_test = train_test_split(data_frame=data_fram)
X_train, X_test, scaler = scale_data(train=X_train, test=X_test)
choosing=input('UNivariate or Multivariate (U or M)?')
if choosing=='U':
X_train, Y_train =univariate_splitter(X_train)
X_test, Y_test = univariate_splitter(X_test)
if choosing=="M":
X_train, Y_train =multivariate_splitter(X_train)
X_test, Y_test = multivariate_splitter(X_test)
return X_train,X_test,Y_train,Y_test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file EnergyData-0.0.2.tar.gz
.
File metadata
- Download URL: EnergyData-0.0.2.tar.gz
- Upload date:
- Size: 18.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 589d38506ac812819384db9196603d8a1825867caa0aa39780315e7f4596c2d4 |
|
MD5 | fc74f7cc8b330a8e3737b50742cc240e |
|
BLAKE2b-256 | f350dcac28bab5ac97a67d046d5c1af225e4262610f1a98720fb4e4a34ea8a47 |
File details
Details for the file EnergyData-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: EnergyData-0.0.2-py3-none-any.whl
- Upload date:
- Size: 19.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ab69ce49b4a64a48cee27d386d6b08d0f83b83f20be5b43e1900fba9a6f378d |
|
MD5 | d9a59139a99bb560e9b4a00749e9d4e7 |
|
BLAKE2b-256 | 3a2aaaf4a95c9690abfd89d57caa609c0c9dae9e17d35cdc58e5f4c649ee3c17 |