Supervised linear transfer learning based on labelled Gaussian mixture models and expectation maximization in scikitlearncompatible form.
Project description
Linear Supervised Transfer Learning
Copyright (C) 2019  Benjamin Paassen
Machine Learning Research Group
Center of Excellence Cognitive Interaction Technology (CITEC)
Bielefeld University
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, see http://www.gnu.org/licenses/.
Introduction
This Python3 library provides several algorithms to learn a linear mapping from
an $m
$dimensional source space to an $n
$dimensional target space, such that
a classification model trained in the source space becomes applicable in the
target space. The source space model is assumed to be a labelled mixture of
Gaussians. Note that this library assumes that the relation between the source
and target space is (approximately) linear and will necessarily fail if the
relationship is highly nonlinear. Further note that this library requires a
few labelled target space data points to work (typically, even ~10 data points
are enough). However, not all classes are required, since the learned linear
transformation generalizes across classes.
If you intend to use this library in academic work, please cite our paper.
Installation
This package is available on pypi
as em_transfer_learning
. You can install
it via
pip install user em_transfer_learning
QuickStart Guide
For a quick start we recommend to take a look at the demo in the notebook
demo.ipynb
. In this file we demonstrate how to perform transfer learning
on example data. For the actual transfer learning, we recommend to initialize
one of the following models, depending on your source space model:
em_transfer_learning.transfer_learning.LGMM_transfer_model
: If you have a full labelled Gaussian mixture model.em_transfer_learning.transfer_learning.SLGMM_transfer_model
: If you have a labelled Gaussian mixture model with shared precision matrices.em_transfer_learning.transfer_learning.Local_LVQ_transfer_model
: If you have a learning vector quantization model with individual metric learning matrices.em_transfer_learning.transfer_learning.LVQ_transfer_model
: If you have a learning vector quantization model with shared metric learning matrix or no metric learning at all.
Note that models 2 and 4 are much faster to train compared to models 1 and 3 (refer to the next section for more information on that).
All these models follow the scikitlearn convention, i.e. you need to call
the fit
function with target space data first and then the predict
function
to map new target space data to the source space according to the learned
mapping.
Background
The basic idea of our transfer learning approach is to maximize the likelihood
of target space data according to the source space data distribution after the
learned transfer function $h
$ has been applied. More precisely, assume we have
a data set $(\vec x_1, y_1), \ldots, (\vec x_m, y_m)
$ of target data points
$\vec x_j \in \mathbb{R}^n
$ and their labels $y_j \in \{1, \ldots, L\}
$.
Then, we wish to maximize the joint probability
\max_h \prod_{j=1}^m p\Big(h(\vec x_j), y_j\Big)
To make this optimization problem feasible, we introduce two assumptions:
First, that $p(x, y)
$ can be modelled by a labelled Gaussian mixture
model (lGMM) and, second, that $h
$ can be approximated by a linear function.
In more detail, that means the following.
Labelled Gaussian Mixture Models
A labelled Gaussian mixture model assumes that data is generated by a mixture
of $K
$ Gaussians, each of which has a prior $P(k)
$, a data generating
Gaussian density $p(\vec xk)
$, and a label generating distribution
$P(yk)
$. Using these distributions, we can derive the joint probability
density $p(\vec x, y)
$ as follows.
p(\vec x, y) = \sum_{k=1}^K p(\vec x, y, k) = \sum_{k=1}^K p(\vec x, yk) \cdot P(k)
Our model assumes that $\vec x
$ and $y
$ are conditionally independent given
the component index $k
$, such that we can rewrite:
p(\vec x, y) = \sum_{k=1}^K p(\vec xk) \cdot P(yk) \cdot P(k)
Note that $p(\vec xk)
$ is a multivariate Gaussian probability density
with parameters for the mean $\vec \mu_k
$ and the precision matrix
$\Lambda_k
$. Also note that this model is a proper generalization over
standard Gaussian mixture models and that many of the GMM properties
translate directly to lGMMs. More precisely, we obtain a standard GMM by setting
the label distribution $P(yk)
$ to a uniform distribution and leaving it
unchanged during training. Alternatively, we also obtain a standard GMM by
assigning the same label to all data points.
Also note that lGMMs generalize over learning vector quantization models if we apply a scaling trick to the precision matrices (for more details on this, refer to our paper).
Expectation Maximization transfer learning
Our assumption that the transfer function $h
$ is approximately linear implies
that $h
$ can be rewritten as $h(\vec x) \approx H \cdot \vec x
$ for some
matrix $H
$. Thus, our transfer learning problem becomes:
\max_H \prod_{j=1}^m \sum_{k=1}^K p(H \cdot \vec xk) \cdot P(yk) \cdot P(k)
Due to the product of sums, a direct optimization of this expression is
infeasible. However, we can apply an expectation maximization scheme.
In particular, we initialize $H
$ with the identity matrix (padded with zeros
wherever necessary) and then iteratively perform the following two steps:

Expectation: We compute the posterior $
p(kH \cdot \vec x_j, y_j)
$ for the current transfer matrix $H
$, all data points $j
$ and all Gaussian components $k
$, yielding a matrix $\Gamma \in \mathbb{R}^{K \times m}
$ with entries $\gamma_{k,j} = p(kH \cdot \vec x_j, y_j)
$. The full expression for the posterior is given in our paper. 
Maximization: We maximize the expected log likelihood under fixed posterior, i.e.
\max_H \sum_{j=1}^m \sum_{k=1}^K \gamma_{k, j} \cdot \log\big[p(H \cdot \vec x_j, y_j k)\big]
This optimization problem can be shown to be convex und thus lends itself for optimization techniques like lBFGS. Even better, if the precision matrix $
\Lambda_k
$ is shared across all Gaussians $k
$, the problem has a closedform solution, namelyH = W \cdot \Gamma \cdot X^T \cdot (X \cdot X^T + \lambda \cdot I)^{1}
where $
W = (\vec \mu_1, \ldots, \vec \mu_K)
$, $X = (\vec x_1, \ldots, \vec x_m)
$, $\lambda
$ is a (small) regularization constant, and $I
$ is the identity matrix. Due to this closed form solution, theSLGMM_transfer_model
and theLVQ_model
are much faster to train compared to theLGMM_transfer_model
and theLocal_LVQ_transfer_model
.
For more detailed background, please refer to our paper.
Contents
This library contains the following files.
demo.ipynb
: A demo script illustrating how to use this library.LICENSE
: A copy of the GPLv3 license.em_transfer_learning/lgmm.py
: A file to train labelled Gaussian mixture models with or without shared precision matrices.em_transfer_learning/transfer_learning.py
: The actual transfer learning models.lgmm_test.py
: A set of unit tests forlgmm.py
.README.md
: This file.transfer_learning_test.py
: A set of unit tests fortransfer_learning.py
.
Licensing
This library is licensed under the GNU General Public License Version 3.
Dependencies
This library depends on NumPy for matrix operations, on scikitlearn for the base interfaces and on SciPy for optimization.
Literature
 Paassen, B., Schulz, A., Hahne, J., and Hammer, B (2018). Expectation maximization transfer learning and its application for bionic hand prostheses. Neurocomputing, 298, 122133. doi:10.1016/j.neucom.2017.11.072. Link
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for em_transfer_learning1.0.0.tar.gz
Algorithm  Hash digest  

SHA256  1759936536eb72b63e528ee23241bba15841f72c04a3a677779ba39d193f22df 

MD5  161c91a9b4d30ccd29cc44b9aa20c419 

BLAKE2b256  c79d297b70d5b478e91ef50d72093d8853742710caf16606569ca088c18157b3 
Hashes for em_transfer_learning1.0.0py3noneany.whl
Algorithm  Hash digest  

SHA256  f2b123b032f5538baed32e59e3ebf0ad86c92dab604a35abeea22b0f91204ce7 

MD5  e20ae784529336c1ca8df5550eb3094d 

BLAKE2b256  6b1d10a6f4188440479befec2ca8952c7e388aa820c89aa71ebbf8d9af240ed8 