Library/framework for making predictions.
Project description
mydatapreprocessing
Load data from web link or local file (json, csv, excel file, parquet, h5...), consolidate it (resample data, clean NaN values, do string embedding) derive new featurs via columns derivation and do preprocessing like standardization or smoothing. If you want to see how functions works, check it's docstrings - working examples with printed results are also in tests - visual.py.
Installation
Python >=3.6 (Python 2 is not supported).
Install just with
pip install mydatapreprocessing
There are some libraries that not every user will be using (for some data inputs).
If you want to be sure to have all libraries, you can download requirements_advanced.txt
and then install
advanced requirements with pip install -r requirements_advanced.txt
.
Examples
import mydatapreprocessing as mdp
Load data. You can use
- python formats (numpy.ndarray, pd.DataFrame, list, tuple, dict)
- local files
- web urls
You can load more data at once in list.
Syntax is always the same.
data = mdp.load_data.load_data(
"https://blockchain.info/unconfirmed-transactions?format=json",
request_datatype_suffix=".json",
data_orientation="index",
predicted_table="txs",
)
# data2 = mdp.load_data.load_data([PATH_TO_FILE.csv, PATH_TO_FILE2.csv])
If you want to use data for some machine learning models, you will probably want to remove Nan values, convert string columns to numeric if possible, do encoding or keep only numeric data and resample.
data_consolidated = mdp.preprocessing.data_consolidation(
data, predicted_column="weight", remove_nans_threshold=0.9, remove_nans_or_replace="interpolate"
)
Functions in feature_engineering
and preprocessing
expects that data are in form (n_samples, n_features).
n_samples are ususally much bigger and therefore transformed in data_consolidation
if necessary.
Extend original data with
data_extended = mdp.feature_engineering.add_derived_columns(data_consolidated, differences=True, rolling_means=32)
preprocess_data
returns preprocessed data, but also last undifferenced value and scaler for inverse
transformation, so unpack it with _
data_preprocessed, _, _ = mdp.preprocessing.preprocess_data(
data_extended,
remove_outliers=True,
smoothit=False,
correlation_threshold=False,
data_transform=False,
standardizeit="standardize",
)
Create models inputs with
seqs, Y, x_input, test_inputs = mdp.create_model_inputs.make_sequences(
data_extended.values, predicts=7, repeatit=3, n_steps_in=6, n_steps_out=1, constant=1
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mydatapreprocessing-2.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a19dadcd20f1d4a59cc2516f3151231b68d5d876ace320d16ed2ba84e2afdcf1 |
|
MD5 | 6c0cf86b08becc987e97dac4b6df3f06 |
|
BLAKE2b-256 | 00856bb09d4012fb357844ec3bf42a900c5e32d1b4608ef0c6a1e54ef2242942 |
Hashes for mydatapreprocessing-2.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c86b2161a3bac9d89231f004017d2e8235d8be25d05d2c8e279439440fe3b8ef |
|
MD5 | 9434113dd4129ee01c49f74f2e25e89e |
|
BLAKE2b-256 | c66c9d0ba08e582301df17a3b6ce360d7c0d2208b875adc95063b85656011cb7 |