Skip to main content

Sensor data acquisition library.

Project description

alt text

PyDSlog

PyDSlog claims to make the data acquisition for machine learning and AI applications on the RMG/941 fast and easy.

Installed by default on the RMG/941. It allows to read the sensor values coming from the serial RS485 interface or via MQTT and stores them in CSV files.

Installation

pip install pydslog

Access to stream

To use the stream it is necessary to initialize the class with the desired frequency, the desired channels, the serial port and the size of the block to be read.

For the MLS/160A:

x = PyDSlog.stream.MLS160A_stream(sz_block=500, 
        channels_to_use=["ACCX", "ACCY", "ACCZ", "GYRX", "GYRY", "GYRZ"],
        frequency=500, port="COM15", baudrate=115200)

and for the IO5640-DS:

x = PyDSlog.stream.IO5640_stream(sz_block=100, 
        channels_to_use=["AI4U", "AI3U", "AI2U", "AI1U", "AI1I", "AI2I"], 
        frequency=500, port="COM15", baudrate=115200)

Where:

  • sz_block: is the size of the block to be read at once when the stream is read.
  • channels_to_use: are the channels to use. Possible are ["ACCX", "ACCY", "ACCZ", "GYRX", "GYRY", "GYRZ"] for the MLS/160A and ["AI4U", "AI3U", "AI2U", "AI1U", "AI1I", "AI2I"] for the IO5640-DS.
  • frequency: is the desired frequency and port is the serial port. the baudrate is fixed at 115200 and can not be changed.

To access the stream directly it is necessary to connect the sensor to the serial port. The connect() method is used for this purpose. To start the stream the start() method is used. After calling the start() function, the sensor will start sending the values of the selected channels with the desired frequency. To read the values, use the read(transpose=False) method. Finally to stop the stream you use the stop() method and to release the serial port disconnect().

As an example, using MLS/160A:

# import PyDSlog
import PyDSlog.stream as stream

# size of the block to read is 500
# the channels that will be streamed are ["ACCX","ACCY","ACCZ","GYRX","GYRY","GYRZ"]
# a frequency of 500 Hz
# sensor is connected to port COM15
# the baudrate is fixed at 115200 and can not be changed.
x = stream.MLS160A_stream(sz_block=500, channels_to_use=["ACCX","ACCY","ACCZ","GYRX","GYRY","GYRZ"],
                                      frequency=500, port="COM15", baudrate=115200, n_frame=100)

try:

    # connect
    x.connect()

    # start
    x.start()

    # read stream. if you want to to transpose the values use transpose = True.
    r = x.read(transpose=False)


finally:

    # stop
    x.stop()
    # disconnect from port
    x.disconnect()


Generate CSV file from sensor values

PyDSlog was developed for the purpose of generating csv files to be used for training Machine Learning algorithms.

The following classes are available for this purpose:

For the MLS/160A:

x = PyDSlog.csv.MLS160A_csv_saver(port, channels_to_use, frequency, block_size, 
                filepath, filename=None, labeled=False, save_as_signal=False,
                header=True, custom_header=None, add_tmp=None, date_format="%d/%m/%Y,%H:%M:%S", 
                baudrate=115200, w_mode="a", delimiter=",")

and for the IO5640-DS:

x = PyDSlog.csv.IO5640_csv_saver(port, channels_to_use, frequency, block_size, 
                 filepath, filename=None, labeled=False, save_as_signal=False,
                 header=True, custom_header=None, add_tmp=None, date_format="%d/%m/%Y,%H:%M:%S", 
                 baudrate=115200, w_mode="a", delimiter=",")

Where:

sz_block is the size of the block to be read at once when the stream is read. channels_to_use are the channels to use. Possible are ["ACCX", "ACCY", "ACCZ", "GYRX", "GYRY", "GYRZ"] for the MLS/160A and ["AI4U", "AI3U", "AI2U", "AI1U", "AI1I", "AI2I"] for the IO5640-DS. frequency is the desired frequency and port is the serial port. the baudrate is fixed at 115200 and can not be changed.

  • Labeled: default False. boolean. If the sensor values have to be labelled (in order to be used with Supervised Learning), it is necessary to use labeled=True.
  • filepath: default None. string pointing to the location of the csv file. filename: default None. string with name of file.
  • save_as_signal: default False. boolean. If the values have to be interpreted as signals, it is necessary to use save_as_signal=True. This makes it possible to use an FFT function later. A separate csv file is generated for each channel. The values that represent a signal are separated by commas. A newline character separates the signals from each other. Each signal has a length of sz_block values.
  • header: default True. boolean. If true, the name of the channel is used as the header
  • custom_header: default None. string In case a different header than the name of the channels is needed.
  • add_tmp: default None. also possible: date, us or ms. In case you want to use a timestamp. for add_tmp="date" the format in date_format is used.
  • date_format: default "%d/%m/%Y,%H:%M:%S" The format for the date used. only necessary if add_tmp="date"
  • delimiter: "," The separator character used.

Once the class to generate the CSV file is initialized it is possible to use the start() method to start saving the sensor values into the CSV file. The pause() method can be used to pause. This method stops the sensor stream but does not close the CSV file or release the serial port. To reactivate the stream and save more values in the file, the start() method can be used again. Finally, the stop() method is used to end the recording.

If the values are labeled it is possible to use the set_label(label) method to configure the label to be used. The label parameter is a number that represents the label with which the values are stored in the file.

As an example, using MLS/160A:

# import libraries
import PyDSlog.csv as csv
import time


# initialize 
x = csv.MLS160A_csv_saver(port="COM3", channels_to_use=["ACCX","ACCY","ACCZ"], frequency=500,
                          block_size=500, filepath="test/", filename="file.csv",
                          labeled=False, save_as_signal=False, header=True,
                          add_tmp="ms", baudrate=115200, w_mode="a")

# start
x.start_csv()

# wait..
time.sleep(3)

#pause
x.pause_csv()

#wait again
time.sleep(3)

# restart
x.start_csv()

# wait again..
time.sleep(3)

# terminate
x.stop_csv()

The output will be something like:

alt text

Fourier Transformation

PyDSlog has a small class included to help transform the signals into the frequency spectrum.

Dependencies: you have to install numpy and scipy before you can use this module

import PyDSlog.transform as fft
import PyDSlog.stream as stream
import matplotlib.pyplot as plt
import numpy as np

chan = ["ACCX","ACCY","ACCZ"]
serial_port = "COM6"
frequency = 400
size_signal = 400
period = 1.0/frequency

# initialize
transform = fft.FFTGenerator(period, size_signal, frequency)
x = stream.MLS160A_stream(sz_block=size_signal, channels_to_use=chan,
                                      frequency=frequency, port=serial_port, baudrate=115200)

# read values as signal
sensor_values = x.read(transpose=False)
sensor_values = np.array(sensor_values)

# transform
ffts_val = transform.doFFT(sensor_values, delete_offset=True)  
# where ffts_val[0,c,:,0] are frequencies and ffts_val[0,c,:,1] are the amplitudes

# plot
plt.plot(ffts_val[0,c,:,0], ffts_val[0,c,:,1], linestyle='-', label="FFT")
plt.show()

PyDSlog simple signal classifier

PyDSlog includes a simple signal classifier. This classifier consists in forming groups where a master vector is stored according to its correlation. During each learning cycle the correlation of the new vector with the master vector of each group is compared. It is assumed that the master vector in the group with the highest correlation to the new vector is the group to which the new vector belongs. Once the group is found, a new master vector is calculated and stored in the group to be used in future operations.

Once the learning process has been completed, it is possible to predict which group a new vector belongs to by calculating the average distance to the master vector from the group with the highest correlation.

Dependencies: you have to install numpy and scipy before you can use this module. For advanced functionality it is also recommended to install scikit-learn

The class:

PyDSlog.classificator.SignalClassificator(min_pears_correlation=0.7, max_subgroups=5, outliers=False)

where:

  • min_pears_correlation: Default is 0.7. float. is the minimum pearson correlation to be used to group a new vector to a group.
  • max_subgroups: Default is 5. integer. is the maximum quantity of groups generated while learning.
  • outliers: Default False. Boolean. If the average distance found when predicting a class for a new vector is lower than the tolerance (see predict(signals, tolerance)), then return -1 as class indicating that it is an unknown class or an anomaly

To train the algorithm it is necessary to use the fit( x_train, y_train) method. Where x_train are the signals and y_train are the labels.

To make a prediction it is necessary to use the predict(signals, tolerance) method. Where signals is the vector to be classified and the tolerance is a number that indicates the tolerance to be used. If the average distance (or difference) of the new vector from the master vector within each group is greater than the tolerance, then the vector belongs to an unknown class or is an anomaly (-1).

As an example:

import PyDSlog.classificator as classificator
import PyDSlog.transform as transform
import pandas as pd
import numpy as np


##################   STREAM DATA PARAMETERS    #################

N = 5000
fs = 500
T = 1 / fs

PREFIX = "1478217877058"

X_FILES = [PREFIX+"_x_ACCX_.csv",PREFIX+"_x_ACCY_.csv",PREFIX+"_x_ACCZ_.csv",
           PREFIX+"_x_GYRX_.csv",PREFIX+"_x_GYRY_.csv",PREFIX+"_x_GYRZ_.csv"]

Y_FILE = PREFIX+"_y_.csv"


def read_signals(name):
    r = pd.read_csv(name, header=None, index_col=None)
    return r


signals = []
for file in X_FILES:
    s = np.array(read_signals("../test/test/"+file))
    signals.append(s)
signals = np.transpose(np.array(signals), (1, 0, 2))

labels = np.array(pd.read_csv("../test/test/"+Y_FILE, header=None, index_col=None))
labels = np.squeeze(labels)

t = transform.FFTGenerator(T, N, fs)
v_ffts = t.doFFT(signals, delete_offset=True)
print(v_ffts.shape)

##################   TRAIN TEST SPLIT    #################

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(v_ffts[:,:,:,1], labels, test_size=0.4)

cls = classificator.SignalClassificator()
cls.fit(x_train, y_train)
y_pred = cls.predict(x_test, 4.5, verbose=True)

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyDSlog-0.0.7.tar.gz (27.8 kB view hashes)

Uploaded Source

Built Distribution

PyDSlog-0.0.7-py3-none-any.whl (39.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page