A machine learning interface for isolated sequence classification algorithms in Python.
Project description
A machine learning interface for isolated sequence classification algorithms in Python.
Introduction
Sequential data is often observed in many different forms such as audio signals and stock prices, to even brain and heart signals. Such data is of particular interest in machine learning, as changing patterns over time naturally provide many interesting opportunities and challenges for classification.
Sequentia is a Python package that implements various classification algorithms for sequential data.
Some examples of how Sequentia can be used in isolated sequence classification include:
- determining a spoken word based on its audio signal or some other representation such as MFCCs,
- identifying potential heart conditions such as arrhythmia from ECG signals,
- predicting motion intent for gesture control from electrical muscle activity,
- classifying hand-written characters according to their pen-tip trajectories,
- classifying hand or head gestures from rotation or movement signals,
- classifying the sentiment of a phrase or sentence in natural language from word embeddings.
Features
Sequentia provides the following algorithms, all supporting multivariate sequences with different durations.
Classification algorithms
- Hidden Markov Models (via
hmmlearn
)
Learning with the Baum-Welch algorithm [1]- Gaussian Mixture Model emissions
- Linear, left-right and ergodic topologies
- Multi-processed predictions
- Dynamic Time Warping k-Nearest Neighbors (via
dtaidistance
)- Sakoe–Chiba band global warping constraint
- Dependent and independent feature warping (DTWD & DTWI)
- Custom distance-weighted predictions
- Multi-processed predictions
Example of a classification algorithm: a multi-class HMM sequence classifier
Preprocessing methods
- Centering, standardization and min-max scaling
- Decimation and mean downsampling
- Mean and median filtering
Installation
pip install sequentia
Documentation
Documentation for the package is available on Read The Docs.
Tutorials and examples
For detailed tutorials and examples on the usage of Sequentia, see the notebooks here.
Below are some basic examples of univariate and multivariate sequences can be used in Sequentia.
Univariate sequences
import numpy as np, sequentia as seq
# Generate training observation sequences and labels
X, y = [
np.array([1, 0, 5, 3, 7, 2, 2, 4, 9, 8, 7]),
np.array([2, 1, 4, 6, 5, 8]),
np.array([5, 8, 0, 3, 1, 0, 2, 7, 9])
], ['good', 'good', 'bad']
# Create and fit the classifier
clf = seq.KNNClassifier(k=1, classes=('good', 'bad'))
clf.fit(X, y)
# Make a prediction for a new observation sequence
x_new = np.array([0, 3, 2, 7, 9, 1, 1])
y_new = clf.predict(x_new)
Multivariate sequences
import numpy as np, sequentia as seq
# Generate training observation sequences and labels
X, y = [
np.array([[1, 0, 5, 3, 7, 2, 2, 4, 9, 8, 7],
[3, 8, 4, 0, 7, 1, 1, 3, 4, 2, 9]]).T,
np.array([[2, 1, 4, 6, 5, 8],
[5, 3, 9, 0, 8, 2]]).T,
np.array([[5, 8, 0, 3, 1, 0, 2, 7, 9],
[0, 2, 7, 1, 2, 9, 5, 8, 1]]).T
], ['good', 'good', 'bad']
# Create and fit the classifier
clf = seq.KNNClassifier(k=1, classes=('good', 'bad'))
clf.fit(X, y)
# Make a prediction for a new observation sequence
x_new = np.array([[0, 3, 2, 7, 9, 1, 1],
[2, 5, 7, 4, 2, 0, 8]]).T
y_new = clf.predict(x_new)
Acknowledgments
In earlier versions of the package (<0.10.0), an approximate dynamic time warping algorithm implementation (fastdtw
) was used in hopes of speeding up k-NN predictions, as the authors of the original FastDTW paper [2] claim that approximated DTW alignments can be computed in linear memory and time - compared to the O(N^2) runtime complexity of the usual exact DTW implementation.
However, I was recently contacted by Prof. Eamonn Keogh (at University of California, Riverside), whose recent work [3] makes the surprising revelation that FastDTW is generally slower than the exact DTW algorithm that it approximates. Upon switching from the fastdtw
package to dtaidistance
(a very solid implementation of exact DTW with fast pure C compiled functions), DTW k-NN prediction times were indeed reduced drastically.
I would like to thank Prof. Eamonn Keogh for directly reaching out to me regarding this finding!
References
Contributors
All contributions to this repository are greatly appreciated. Contribution guidelines can be found here.
Edwin Onuonga ✉️ 🌍 |
Prhmma |
---|
Sequentia © 2019-2022, Edwin Onuonga - Released under the MIT License.
Authored and maintained by Edwin Onuonga.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sequentia-0.11.1.tar.gz
.
File metadata
- Download URL: sequentia-0.11.1.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b56cea5af1f2789d6207ab266b5ba1fff04476a3b895ada78981b8cb2129577e |
|
MD5 | 675a66e488bacfb4a59d4023204fe45f |
|
BLAKE2b-256 | 2a7ba27efbbefb5f27f79d124c00f885a26036442a2187bcc909403df2fc1f02 |