creme

Incremental machine learning in Python

These details have not been verified by PyPI

Project links

Homepage

Project description

creme is a library for incremental learning. Incremental learning is a machine learning regime where the observations are made available one by one. It is also known as online learning, iterative learning, or sequential learning. This is in contrast to batch learning where all the data is processed at once. Incremental learning is desirable when the data is too big to fit in memory, or simply when it isn't available all at once. creme's API is heavily inspired from that of scikit-learn, enough so that users who are familiar with it should feel right at home.

Useful links

Installation

:warning: creme requires Python 3.6 or above.

creme mostly relies on Python's standard library. However, it sometimes relies on numpy, scipy, and scikit-learn in order to avoid reinventing the wheel. creme can simply be installed with pip.

pip install creme

Quick example

In the following snippet we'll be fitting an online logistic regression. The weights of the model will be optimized with the AdaGrad algorithm. We'll scale the data so that each variable has a mean of 0 and a standard deviation of 1. The standard scaling and the logistic regression are combined using a compose. We'll be using the stream.iter_sklearn_dataset function for streaming over the Wisconsin breast cancer dataset. We'll measure the ROC AUC using progressive validation.

>>> from creme import compose
>>> from creme import linear_model
>>> from creme import model_selection
>>> from creme import optim
>>> from creme import preprocessing
>>> from creme import stream
>>> from sklearn import datasets
>>> from sklearn import metrics

>>> X_y = stream.iter_sklearn_dataset(
...     load_dataset=datasets.load_breast_cancer,
...     shuffle=True,
...     random_state=42
... )
>>> optimizer = optim.AdaGrad()
>>> model = compose.Pipeline([
...     ('scale', preprocessing.StandardScaler()),
...     ('learn', linear_model.LogisticRegression(optimizer))
... ])
>>> metric = metrics.roc_auc_score

>>> model_selection.online_score(X_y, model, metric)
0.993030...

Comparison with other solutions

scikit-learn: Some of it's estimators have a partial_fit method which allows them to update themselves with new observations. However, online learning isn't a first class citizen, which can make it a bit awkward to put a streaming pipeline in place. You should definitely use scikit-learn if your data fits in memory and that you can afford retraining your model from scratch when you have new data to train on.
Vowpal Wabbit: VW is probably the fastest out-of-core learning system available. At it's core it implements a state-of-the-art adaptive gradient descent algorithm with many tricks. It also has some mechanisms for doing active learning and using bandits. However it isn't a "true" online learning system as it assumes the data is available in a file and can looped over multiple times. Also it is somewhat difficult to grok for newcomers.
LIBOL: This is very good library written by academics with some great documentation. It's written in C++ and seems to be pretty fast. However it only focuses on the learning aspect of online learning, not on other mundane yet useful tasks such as feature extraction and preprocessing. Moreover it hasn't been updated for a few years.
Spark Streaming: This is an extension of Apache Spark which caters to big data practitioners. It provides a lot of practical tools for manipulating streaming data in it's true sense. It also has some compatibility with the MLlib for implementing online learning algorithms, such as streaming linear regression and streaming k-means. However it is a somewhat overwhelming solution which might be a bit overkill for certain use cases.
TensorFlow: Deep learning systems are in some sense online learning systems. Indeed it is possible to put in place a DL pipeline for learning from incoming observations. Because frameworks such as Keras and PyTorch are popular and well-backed, there is no real point in implementing neural networks in creme. For a lot of problems neural networks might not be the right tool, and you might want to use a simple logistic regression or a decision tree (for which online algorithms exist).

Feel free to open an issue if you feel like other solutions are worth mentioning.

Development

creme is very young so there is a lot to do. The broad goals for the near future are to:

implement simple but useful algorithms
identify bottlenecks and use Cython when possible
write good documentation and write example notebooks
make life easier for those who want to put a streaming pipeline in production

License

See the license file.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.6.1

Jun 10, 2020

0.6.0

Jun 9, 2020

0.5.1

Mar 29, 2020

0.5.0

Mar 13, 2020

0.4.4

Nov 11, 2019

0.4.3

Oct 27, 2019

0.4.2

Oct 23, 2019

0.4.1

Oct 23, 2019

0.4.0

Oct 23, 2019

0.3.0

Jun 23, 2019

0.2.0

May 27, 2019

0.1.0

May 8, 2019

0.0.3

Mar 21, 2019

This version

0.0.2

Feb 13, 2019

0.0.1

Jan 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

creme-0.0.2.tar.gz (38.3 kB view details)

Uploaded Feb 13, 2019 Source

File details

Details for the file creme-0.0.2.tar.gz.

File metadata

Download URL: creme-0.0.2.tar.gz
Upload date: Feb 13, 2019
Size: 38.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for creme-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`e26598a65f7d6bf9831ddbe41d34655c40f20a5e88190fe284d1f4ad3a95af96`
MD5	`ca9a34a521c389aca0bb27063018ec7d`
BLAKE2b-256	`822b3f7491aad2e71057d0dd129277c5ba62321dfba967f337dd1201dd181161`

See more details on using hashes here.

creme 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Useful links

Installation

Quick example

Comparison with other solutions

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes