Skip to main content

Library/framework for making predictions.

Project description

mydatapreprocessing

PyPI pyversions PyPI version Language grade: Python Build Status Documentation Status License: MIT codecov

Load data from web link or local file (json, csv, excel file, parquet, h5...), consolidate it, do preprocessing like resampling, standardization, string embedding, new columns derivation, feature extraction etc. based on configuration.

Library contain 3 modules.

First - preprocessing load data and preprocess it. It contains functions like load_data, data_consolidation, preprocess_data, preprocess_data_inverse, add_frequency_columns, rolling_windows, add_derived_columns etc.

Example

data = "https://blockchain.info/unconfirmed-transactions?format=json"

# Load data from file or URL
data_loaded = mdp.load_data(data, request_datatype_suffix=".json", predicted_table='txs')

# Transform various data into defined format - pandas dataframe - convert to numeric if possible, keep
# only numeric data and resample ifg configured. It return array, dataframe
data_consolidated = mdp.data_consolidation(
    data_loaded, predicted_column="weight", data_orientation="index", remove_nans_threshold=0.9, remove_nans_or_replace='interpolate')

# Preprocess data. It return preprocessed data, but also last undifferenced value and scaler for inverse
# transformation, so unpack it with _
data_preprocessed, _, _ = mdp.preprocess_data(data_consolidated, remove_outliers=True, smoothit=False,
                                              correlation_threshold=False, data_transform=False, standardizeit='standardize')

Allowed data formats for load_data are examples

# myarray_or_dataframe # Numpy array or Pandas.DataFrame
# r"/home/user/my.json" # Local file. The same with .parquet, .h5, .json or .xlsx. On windows it's necessary to use raw string - 'r' in front of string because of escape symbols \
# "https://yoururl/your.csv" # Web url (with suffix). Same with json.
# "https://blockchain.info/unconfirmed-transactions?format=json" # In this case you have to specify also 'request_datatype_suffix': "json", 'data_orientation': "index", 'predicted_table': 'txs',
# [{'col_1': 3, 'col_2': 'a'}, {'col_1': 0, 'col_2': 'd'}] # List of records
# {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']} # Dict with colums or rows (index) - necessary to setup data_orientation!

Second module is inputs. It take tabular time series data and put it into format that can be inserted into machine learning models for example on sklearn or tensorflow. It contain functions make_sequences, create_inputs and create_tests_outputs

Third module is generatedata. It generate some basic data like sin, ramp random. In the future, it will also import some real datasets for models KPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mydatapreprocessing-1.0.8.tar.gz (18.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mydatapreprocessing-1.0.8-py3.7.egg (40.1 kB view details)

Uploaded Egg

mydatapreprocessing-1.0.8-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file mydatapreprocessing-1.0.8.tar.gz.

File metadata

  • Download URL: mydatapreprocessing-1.0.8.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.1

File hashes

Hashes for mydatapreprocessing-1.0.8.tar.gz
Algorithm Hash digest
SHA256 57a663e87128583dfcd026b90fd283943b9dcb68f5e3976cd702d01048587d4c
MD5 e25ade5b0abadfb27aad15dd00612ff7
BLAKE2b-256 f69b6623397b8ce1a3f01080e75cfd28db73b181c3f0b36ca55088e8ac2e5377

See more details on using hashes here.

File details

Details for the file mydatapreprocessing-1.0.8-py3.7.egg.

File metadata

  • Download URL: mydatapreprocessing-1.0.8-py3.7.egg
  • Upload date:
  • Size: 40.1 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.1

File hashes

Hashes for mydatapreprocessing-1.0.8-py3.7.egg
Algorithm Hash digest
SHA256 3b61f2e8a387988d6d1a5588fd12eaf346efedd49b9df6f95c862f3838602c88
MD5 7a11159df506cc56fbe5213e2b34c131
BLAKE2b-256 19a34ec0b62c7a613ef6c5d14a157fbcfd1e79f47e725f3c59ef24147b85d3c7

See more details on using hashes here.

File details

Details for the file mydatapreprocessing-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: mydatapreprocessing-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.1

File hashes

Hashes for mydatapreprocessing-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 0854970ab358b1830a2926a8f159918a57e021a6bee9454f58be5435dc1ceb95
MD5 90940c8f5c9d40be000330d69a4af88d
BLAKE2b-256 88e40371e9134b8a1edf7f1c8f81b4844a84450306c77f69b75ea4a6ea6cfa57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page