Subsampled Online Matrix Factorization in Python
Project description
# MODL: Massive Online Dictionary Learning
[![Travis](https://travis-ci.org/arthurmensch/modl.svg?branch=master)](https://travis-ci.org/arthurmensch/modl)
[![Coveralls](https://coveralls.io/repos/github/arthurmensch/modl/badge.svg?branch=master)](https://coveralls.io/github/arthurmensch/modl?branch=master)
This python package ([webpage](https://github.com/arthurmensch/modl)) implements the two following papers:
>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux.
[Stochastic Subsampling for Factorizing Huge Matrices](https://hal.archives-ouvertes.fr/hal-01431618v1). <hal-01431618> 2017.
>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux.
[Dictionary Learning for Massive Matrix Factorization](https://hal.archives-ouvertes.fr/hal-01308934v2). International Conference
on Machine Learning, Jun 2016, New York, United States. 2016
It allows to perform sparse / dense matrix factorization on fully-observed/missing data very efficiently, by leveraging random subsampling with online learning.
It is able to factorize matrices of terabyte scale with hundreds of components in the latent space in a few hours.
This package allows to reproduce the
experiments and figures from the papers.
More importantly, it provides [https://github.com/scikit-learn/scikit-learn](scikit-learn) compatible
estimators that fully implements the proposed algorithms.
## Installing from source with pip
Installation from source is simple. In a command prompt:
```
git clone https://github.com/arthurmensch/modl.git
cd modl
pip install -r requirements.txt
pip install .
cd $HOME
py.test --pyargs modl
```
*This package is only tested with Python 3.5+ !*
## Core code
The package essentially provides three estimators:
- `DictFact`, that computes a matrix factorization from Numpy arrays
- `fMRIDictFact`, that computes sparse spatial maps from fMRI images
- `ImageDictFact`, that computes a patch dictionary from an image
- `RecsysDictFact`, that allows to predict score from a collaborative filtering approach
## Examples
### fMRI decomposition
A fast running example that decomposes a small dataset of resting-fmri data into a 70 components map is provided
```
python examples/decompose_fmri.py
```
It can be adapted for running on the 2TB HCP dataset, by changing the source parameter into 'hcp' (you will need to download the data first)
### Hyperspectral images
A fast running example that extracts the patches of a HD image can be run from
```
python examples/decompose_image.py
```
It can be adapted to run on AVIRIS data, changing the image source into 'aviris' in the file.
### Recommender systems
Our core algorithm can be run to perform collaborative filtering very efficiently:
```
python examples/recsys_compare.py
```
You will need to download datasets beforehand:
```
make download-movielens1m
make download-movielens10m
```
## Future work
- `sacred` dependency will be removed
- Release a fetcher for HCP from S3 bucker
- Release examples with larger datasets and benchmarks
## Contributions
Please feel free to report any issue and propose improvements on github.
## References
Related projects :
- [spira](https://github.com/mblondel/spira) is a python library to perform collaborative filtering based on coordinate descent. It serves as the baseline for recsys experiments - we hard included it for simplicity.
- [scikit-learn](https://github.com/scikit-learn/scikit-learn) is a python library for machine learning. It serves as the basis of this project.
- [nilearn](https://github.com/nilearn/nilearn) is a neuro-imaging library that we wrap in our fMRI related estimators.
## Author
Licensed under simplified BSD.
Arthur Mensch, 2015 - present
[![Travis](https://travis-ci.org/arthurmensch/modl.svg?branch=master)](https://travis-ci.org/arthurmensch/modl)
[![Coveralls](https://coveralls.io/repos/github/arthurmensch/modl/badge.svg?branch=master)](https://coveralls.io/github/arthurmensch/modl?branch=master)
This python package ([webpage](https://github.com/arthurmensch/modl)) implements the two following papers:
>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux.
[Stochastic Subsampling for Factorizing Huge Matrices](https://hal.archives-ouvertes.fr/hal-01431618v1). <hal-01431618> 2017.
>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux.
[Dictionary Learning for Massive Matrix Factorization](https://hal.archives-ouvertes.fr/hal-01308934v2). International Conference
on Machine Learning, Jun 2016, New York, United States. 2016
It allows to perform sparse / dense matrix factorization on fully-observed/missing data very efficiently, by leveraging random subsampling with online learning.
It is able to factorize matrices of terabyte scale with hundreds of components in the latent space in a few hours.
This package allows to reproduce the
experiments and figures from the papers.
More importantly, it provides [https://github.com/scikit-learn/scikit-learn](scikit-learn) compatible
estimators that fully implements the proposed algorithms.
## Installing from source with pip
Installation from source is simple. In a command prompt:
```
git clone https://github.com/arthurmensch/modl.git
cd modl
pip install -r requirements.txt
pip install .
cd $HOME
py.test --pyargs modl
```
*This package is only tested with Python 3.5+ !*
## Core code
The package essentially provides three estimators:
- `DictFact`, that computes a matrix factorization from Numpy arrays
- `fMRIDictFact`, that computes sparse spatial maps from fMRI images
- `ImageDictFact`, that computes a patch dictionary from an image
- `RecsysDictFact`, that allows to predict score from a collaborative filtering approach
## Examples
### fMRI decomposition
A fast running example that decomposes a small dataset of resting-fmri data into a 70 components map is provided
```
python examples/decompose_fmri.py
```
It can be adapted for running on the 2TB HCP dataset, by changing the source parameter into 'hcp' (you will need to download the data first)
### Hyperspectral images
A fast running example that extracts the patches of a HD image can be run from
```
python examples/decompose_image.py
```
It can be adapted to run on AVIRIS data, changing the image source into 'aviris' in the file.
### Recommender systems
Our core algorithm can be run to perform collaborative filtering very efficiently:
```
python examples/recsys_compare.py
```
You will need to download datasets beforehand:
```
make download-movielens1m
make download-movielens10m
```
## Future work
- `sacred` dependency will be removed
- Release a fetcher for HCP from S3 bucker
- Release examples with larger datasets and benchmarks
## Contributions
Please feel free to report any issue and propose improvements on github.
## References
Related projects :
- [spira](https://github.com/mblondel/spira) is a python library to perform collaborative filtering based on coordinate descent. It serves as the baseline for recsys experiments - we hard included it for simplicity.
- [scikit-learn](https://github.com/scikit-learn/scikit-learn) is a python library for machine learning. It serves as the basis of this project.
- [nilearn](https://github.com/nilearn/nilearn) is a neuro-imaging library that we wrap in our fMRI related estimators.
## Author
Licensed under simplified BSD.
Arthur Mensch, 2015 - present
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
modl-0.6.1.1.tar.gz
(1.1 MB
view details)
File details
Details for the file modl-0.6.1.1.tar.gz
.
File metadata
- Download URL: modl-0.6.1.1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7df69946b1cb7232cd1688ed6cc5836fd1a5280a040362ae87ccd32de7e6b053 |
|
MD5 | 329fdf6d58856ece56d7bf34c33e0c6a |
|
BLAKE2b-256 | c61ba80192216e6cb547c142aec6e72c2d26168d80f465f2eacd67bbde5991f4 |