Skip to main content

Library/framework for making predictions.

Project description

Load data from web link or local file (json, csv, excel file, parquet, h5...), consolidate it and do preprocessing like resampling, standardization, string embedding, new columns derivation, feature extraction etc. based on configuration.

Library contain 3 modules.

First - preprocessing load data and preprocess it. It contains functions like load_data, data_consolidation, preprocess_data, preprocess_data_inverse, add_frequency_columns, rolling_windows, add_derived_columns etc.

Examples:

>>> import mydatapreprocessing.preprocessing as mdpp

>>> data = "https://blockchain.info/unconfirmed-transactions?format=json"

>>> # Load data from file or URL
>>> data_loaded = mdpp.load_data(data, request_datatype_suffix=".json", predicted_table='txs')

>>> # Transform various data into defined format - pandas dataframe - convert to numeric if possible, keep
>>> # only numeric data and resample ifg configured. It return array, dataframe
>>> data_consolidated = mdpp.data_consolidation(
>>>     data_loaded, predicted_column="weight", data_orientation="index", remove_nans_threshold=0.9, remove_nans_or_replace='interpolate')

>>> # Preprocess data. It return preprocessed data, but also last undifferenced value and scaler for inverse
>>> # transformation, so unpack it with _
>>> data_preprocessed, _, _ = mdpp.preprocess_data(data_consolidated, remove_outliers=True, smoothit=False,
>>>                                                correlation_threshold=False, data_transform=False, standardizeit='standardize')


>>> # Allowed data formats for load_data are examples

>>> # myarray_or_dataframe # Numpy array or Pandas.DataFrame
>>> # r"/home/user/my.json" # Local file. The same with .parquet, .h5, .json or .xlsx. On windows it's necessary to use raw string - 'r' in front of string because of escape symbols     >>> # "https://yoururl/your.csv" # Web url (with suffix). Same with json.
>>> # "https://blockchain.info/unconfirmed-transactions?format=json" # In this case you have to specify also 'request_datatype_suffix': "json", 'data_orientation': "index", 'predicted_table': 'txs',
>>> # [{'col_1': 3, 'col_2': 'a'}, {'col_1': 0, 'col_2': 'd'}] # List of records
>>> # {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']} # Dict with colums or rows (index) - necessary to setup data_orientation!

>>> # You can use more files in list and data will be concatenated. It can be list of paths or list of python objects. Example:

>>> # [{'col_1': 3, 'col_2': 'a'}, {'col_1': 0, 'col_2': 'd'}]  # List of records
>>> # [np.random.randn(20, 3), np.random.randn(25, 3)]  # Dataframe same way
>>> # ["https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv", "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv"]  # List of URLs
>>> # ["path/to/my1.csv", "path/to/my1.csv"]

Second module is inputs. It take tabular time series data (usually processed by module preprocessing) and put it into format that can be inserted into machine learning models for example on sklearn or tensorflow. It contain functions make_sequences, create_inputs and create_tests_outputs

Examples:

>>> import mydatapreprocessing as mdp

>>> data = np.array([[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12 ,13, 14 ,15, 16], [17 ,18 ,19, 20, 21, 22, 23, 24]]).T
>>> X, y, x_input, _ = mdp.inputs.make_sequences(data, n_steps_in= 2, n_steps_out=3)

>>> # This example create from such a array:

>>> # data = array([[1, 9, 17],
>>> #               [2, 10, 18],
>>> #               [3, 11, 19],
>>> #               [4, 12, 20],
>>> #               [5, 13, 21],
>>> #               [6, 14, 22],
>>> #               [7, 15, 23],
>>> #               [8, 16, 24]])

>>> # Such a results (data are serialized).

>>> # X = array([[1, 2, 3, 9, 10, 11, 17, 18, 19],
>>> #            [2, 3, 4, 10, 11, 12, 18, 19, 20],
>>> #            [3, 4, 5, 11, 12, 13, 19, 20, 21],
>>> #            [4, 5, 6, 12, 13, 14, 20, 21, 22]])

>>> # y = array([[4, 5],
>>> #            [5, 6],
>>> #            [6, 7],
>>> #            [7, 8]]

>>> # x_input = array([[ 6,  7,  8, 14, 15, 16, 22, 23, 24]])

Third module is generatedata. It generate some basic data like sin, ramp random. In the future, it will also import some real datasets for models KPI.

Examples:

>>> import mydatapreprocessing as mdp

>>> data = mdp.generatedata.gen_sin(1000)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mydatapreprocessing-1.1.24.tar.gz (24.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mydatapreprocessing-1.1.24-py3.7.egg (55.4 kB view details)

Uploaded Egg

mydatapreprocessing-1.1.24-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file mydatapreprocessing-1.1.24.tar.gz.

File metadata

  • Download URL: mydatapreprocessing-1.1.24.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.1

File hashes

Hashes for mydatapreprocessing-1.1.24.tar.gz
Algorithm Hash digest
SHA256 f12ad112b18646dc40c2081e8ebc7040a21c16ef2835cda3207b0e77841c9a71
MD5 cc8c2068a16875fa4a4f5d2367267993
BLAKE2b-256 e4339851938a0b8550b1ef855c33650991ed4864522225bd510b6264b8c033d8

See more details on using hashes here.

File details

Details for the file mydatapreprocessing-1.1.24-py3.7.egg.

File metadata

  • Download URL: mydatapreprocessing-1.1.24-py3.7.egg
  • Upload date:
  • Size: 55.4 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.1

File hashes

Hashes for mydatapreprocessing-1.1.24-py3.7.egg
Algorithm Hash digest
SHA256 a5afacdfdb23d18cd1556cc731e3e22f5f0a3231ed7bf27ceb7a9b362526c79b
MD5 f4ceec7416d3f096953e95a62b3cb415
BLAKE2b-256 021a7b8e2efcb8aee54b9327d087004320c769bd69e8591e3c017ed68d4b8486

See more details on using hashes here.

File details

Details for the file mydatapreprocessing-1.1.24-py3-none-any.whl.

File metadata

  • Download URL: mydatapreprocessing-1.1.24-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.1

File hashes

Hashes for mydatapreprocessing-1.1.24-py3-none-any.whl
Algorithm Hash digest
SHA256 7e88e209082ba35e3b30cf83a0e606e7d50fec0df979a753a647d0e2bcb2108a
MD5 0bbf9480b4ba091be4143ecb1166a385
BLAKE2b-256 27b55f9571de5bf1b009ef506a30503f4c591193029125d3bb2f4d2c6b1ef882

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page