HMM and DTW-based sequence machine learning algorithms in Python following an sklearn-like interface.
Project description
Sequentia
HMM and DTW-based sequence machine learning algorithms in Python following an sklearn-like interface.
About · Build Status · Features · Documentation · Examples · Acknowledgments · References · Contributors
About
Sequentia is a Python package that provides various classification and regression algorithms for sequential data, including methods based on hidden Markov models and dynamic time warping.
Some examples of how Sequentia can be used on sequence data include:
- determining a spoken word based on its audio signal or alternative representations such as MFCCs,
- predicting motion intent for gesture control from sEMG signals,
- classifying hand-written characters according to their pen-tip trajectories,
- predicting the gene family that a DNA sequence belongs to.
Build Status
master |
dev |
---|---|
Features
The following models provided by Sequentia all support variable length sequences.
-
Dynamic Time Warping + k-Nearest Neighbors (via
dtaidistance
)- Classification
- Regression
- Multivariate real-valued observations
- Sakoe–Chiba band global warping constraint
- Dependent and independent feature warping (DTWD/DTWI)
- Custom distance-weighted predictions
- Multi-processed predictions
-
Hidden Markov Models (via
hmmlearn
)
Parameter estimation with the Baum-Welch algorithm and prediction with the forward algorithm [1]- Classification
- Multivariate real-valued observations (Gaussian mixture model emissions)
- Univariate categorical observations (discrete emissions)
- Linear, left-right and ergodic topologies
- Multi-processed predictions
HMM Sequence Classifier
Installation
You can install Sequentia using pip
.
Stable
The latest stable version of Sequentia can be installed with the following command.
pip install sequentia
C library compilation
For optimal performance when using any of the k-NN based models, it is important that dtaidistance
C libraries are compiled correctly.
Please see the dtaidistance
installation guide for troubleshooting if you run into C compilation issues, or if setting use_c=True
on k-NN based models results in a warning.
Pre-release
Pre-release versions include new features which are in active development and may change unpredictably.
The latest pre-release version can be installed with the following command.
pip install --pre sequentia
Development
Please see the contribution guidelines to see installation instructions for contributing to Sequentia.
Documentation
Documentation for the package is available on Read The Docs.
Examples
This example demonstrates multivariate sequences classified into classes 0
/1
using the KNNClassifier
.
import numpy as np
from sequentia.models import KNNClassifier
# Generate training sequences and labels
X = [
np.array([[1., 0., 5., 3., 7., 2., 2., 4., 9., 8., 7.],
[3., 8., 4., 0., 7., 1., 1., 3., 4., 2., 9.]]).T,
np.array([[2., 1., 4., 6., 5., 8.],
[5., 3., 9., 0., 8., 2.]]).T,
np.array([[5., 8., 0., 3., 1., 0., 2., 7., 9.],
[0., 2., 7., 1., 2., 9., 5., 8., 1.]]).T
]
y = [0, 1, 1]
# Sequentia expects a concatenated array of sequences (and their corresponding lengths)
X, lengths = np.vstack(X), [len(x) for x in X]
# Create and fit the classifier
clf = KNNClassifier(k=1).fit(X, y, lengths)
# Make a prediction for a new observation sequence
x_new = np.array([[0., 3., 2., 7., 9., 1., 1.],
[2., 5., 7., 4., 2., 0., 8.]]).T
y_new = clf.predict(x_new)
Acknowledgments
In earlier versions of the package, an approximate DTW implementation fastdtw
was used in hopes of speeding up k-NN predictions, as the authors of the original FastDTW paper [2] claim that approximated DTW alignments can be computed in linear memory and time, compared to the O(N2) runtime complexity of the usual exact DTW implementation.
I was contacted by Prof. Eamonn Keogh whose work [3] makes the surprising revelation that FastDTW is generally slower than the exact DTW algorithm that it approximates. Upon switching from the fastdtw
package to dtaidistance
(a very solid implementation of exact DTW with fast pure C compiled functions), DTW k-NN prediction times were indeed reduced drastically.
I would like to thank Prof. Eamonn Keogh for directly reaching out to me regarding this finding.
References
Contributors
All contributions to this repository are greatly appreciated. Contribution guidelines can be found here.
eonu |
Prhmma |
manisci |
jonnor |
---|
Sequentia © 2019-2023, Edwin Onuonga - Released under the MIT License.
Authored and maintained by Edwin Onuonga.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.