Skip to main content

This library contains AI code for training purposes.

Project description

semantic-release

License: AGPL v3

Project hosting the trainer for generic estimation models.

AI EASIER.AI Trainer Library

Regarding the way to accurately estimate next possible values based on the context information, the trainer makes use of the Python-based framework Keras that operates on top of TensorFlow Google's Deep Learning framework.

Training the models

The trainer is defined as a microservice so that it is easy to be run and deployed in any use case and infrastructure. In addition, it is fully configurable and adaptable for scenario.

With the data produced by the sensors every day, a database is collected for every entity/id so that a predictive model can be produced for each entity. This model will recognize and detect the patterns in the data so that it is able to forecast (predict), considering some context input data, the next value(s) of the time series; or to estimate the value of a specific target feature.

In order to train the model, an Elasticsearch database is used to obtain the data. Models are stored in MINIO so that they can be loaded from other microservices.

Configuration

There are several parameters that can be modified to change how the system learns. These parameters are enclosed in the file ./config/config_trainer_estimation.ini.sample, and are explained here:

Configuration parameters guide

INFERENCE section:

  • data_type: can be timeseries or features and stands for the type of the data to train the model. If the datatype is features, the next two parameters are ignored.
  • num_forecasts: Number of values that the system will output when a prediction is requested, when the data_type is timeseries.
  • num_previous_measures: This parameter affects the learning algorithm itself, it stands for the length of the time series, which is the number of previous values considered for the learning.

ML_INITIAL section:

  • initial_train: Perform an initial training with all the data available, if there is no previous model to load (true/false)
  • time_window_to_initial: Period of time for looking back for data in elasticsearch for the initial training. One year is 1y

ML section:

  • algorithm: Name of the algorithm to be used as Neural Network, currently 'lstm' (default), 'phasedlstm' and 'dense' models are available.
  • learning_rate: Float number indicating the learning rate for the model. It is recommended to start with a small number as 0.001.
  • epoch_internal: Number of times the full training set is passed though the neural network for a specific batch size. Default is 50
  • epoch_external: Number of times the batch_size is increased and the neural network is retrained (epoch_internal times). Default is 1
  • batch_size: Number of examples or size of the batches in which each epoch_internal is divided. Default is 200 (increases with each epoch_external)
  • initial_validation_split: Percentage of validation examples used to test the model when training (for the optimizing function). Default is 0.05
  • validation_split_multiplayer: Multiplier of the validation split used in each epoch_external. Default is 1.75
  • batch_size_multiplier: Multiplier of the size of the batches for every epoch external. Default is 1.5
  • minimum_samples: Minimum number of samples to train the system
  • training_size: Number of samples which will be used to train the model
  • time_window_to: Period of time for looking back for data in elasticsearch. One week is 1w. One month would be 1M
  • time_window_from: Time to start looking back to for data in elasticsearch. Default is now
  • resample: Resample yes/no
  • resample_time: Can be empty - Delta time between measures used to resample the timeseries. This will put data every resample_time SECONDS (if there is no data, previous value is used)
  • delta_max_std: Maximum time in SECONDS between measures to consider the time series as synchronous

The format for time_window parameters follows Date Math from elasticsearch API.

ELASTIC section:

  • index_entities: Name of the index that stores the entities
  • index_data: Name of the index that stores the data
  • index_scalers: Name of the index that stores the scalers
  • index_predictions: Name of the index that stores the predictions
  • index_models: Name of the index that stores the models. It is overwritten by the environment variable TRAINING_RESULTS_ID
  • mapping_data: Name of the mapping that defines the format in the index of data
  • mapping_entities: Name of the mapping that defines the format in the index of entities
  • mapping_models: Name of the mapping that defines the format in the index of models
  • mapping_predictions: Name of the mapping that defines the format in the index of predictions

DATA section:

  • time_index: Column label used for indexing data (timestamp column), typically timestamp
  • inference_features: Name of the feature(s) that is/are going to be forecasted
  • dataset_features: Name of the other features used only for training

ELK section

  • elastic_host: Hostname of elasticsearch. Example: localhost
  • elastic_port: Port of communication with elasticsearch. Example: 9200 (default of elasticsearch)

MINIO section

  • minio_host: Hostname of MINIO. Example: minio
  • minio_port: Port of communication with MINIO (default is 9000)
  • minio_access: Access key (username) configured in MINIO
  • minio_secret: Secret key (password) configured in MINIO

Instructions

Use this command to launch the container:

docker run -e ELASTIC_HOST=[$ELASTIC_HOST] -e ELASTIC_PORT=[$ELASTIC_PORT] -e MINIO_ACCESS_KEY=[$MINIO_ACCESS_KEY] -e MINIO_SECRET_KEY=[$MINIO_SECRET_KEY] -e MINIO_SERVICE_HOST=[$MINIO_HOST] -e MINIO_SERVICE_PORT=[$MINIO_PORT] -e TRAINING_RESULTS_ID=[$TRAINING_RESULTS_ID] -e INPUT_FEATURES=[$INPUT_FEATURES] -e PREDICTION_FEATURES=[$PREDICTION_FEATURES] -v ./config/:/usr/app/src/config --name trainer easierai/trainer:1.0  

Apart from the basic environment variables, you can perform a more advanced configuration by overriding the configuration file inside the trainer (./config/config.ini to the docker file: /usr/app/src/config/config.ini) by passing a volume to the docker image (notice that the volume is a folder named config in which there is a file named config.ini) adding the tag -v to the docker command.

Notice:

  • Variables MINIO_ACCESS_KEY and MINIO_SECRET_KEY are, respectively, the username and password of the MINIO service deployed, check the configuration of this service to know more.

  • Appart from those variable, you must specify at least @tag (for example 1.1) and the folder that contains the configuration file as a volume. Make sure that inside the folder ./config there should be a file called config.ini with the configuration file previously explained.

  • In addition, you should specify the environment variables for the elasticsearch host, the elasticsearch port and minio host and port. If you do not provide them, the ones in the config.ini file will be used. You should also open the port used as REST API and make sure you use a different port than the inferencer if you plan to launch both on the same machine. As you can see, there are a few more variables you need to configure:

  • ELASTIC_HOST: elasticsearch host IP or hostname.

  • ELASTIC_PORT: elasticsearch port.

  • MINIO_SERVICE_HOST: MINIO host IP or hostname.

  • MINIO_SERVICE_PORT: MINIO port.

  • MINIO_ACCESS: MINIO access key (username for the MINIO repository)

  • MINIO_SECRET: MINIO secret key (password for the MINIO repository)

You can also add this piece of code in your docker-compose file:

  trainer:
    image: easierai/trainer:1.0
    container_name: trainer
    environment:
      NODE_ENV: development
      ELASTIC_HOST: 127.0.0.1
      ELASTIC_PORT: 9200
      MINIO_SERVICE_HOST: 127.0.0.1
      MINIO_SERVICE_PORT:9000
      MINIO_ACCESS: username
      MINIO_SECRET: password
      TRAINING_RESULTS_ID: experiment-001
      INPUT_FEATURES: ratio,free
      INFERENCE_FEATURES: ratio

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easierai_trainer_library-0.1.72.tar.gz (16.9 kB view hashes)

Uploaded Source

Built Distribution

easierai_trainer_library-0.1.72-py3-none-any.whl (29.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page