Skip to main content

Incremental machine learning in Python

Project description

creme_logo

creme is a library for incremental learning. Incremental learning is a machine learning regime where the observations are made available one by one. It is also known as online learning, iterative learning, or sequential learning. This is in contrast to batch learning where all the data is processed at once. Incremental learning is desirable when the data is too big to fit in memory, or simply when it isn't available all at once. creme's API is heavily inspired from that of scikit-learn, enough so that users who are familiar with it should feel right at home.

Useful links

Installation

:point_up: creme is tested with Python 3.6 and above.

creme mostly relies on Python's standard library. Sometimes it relies on numpy, scipy, and scikit-learn so as not to reinvent the wheel. creme can simply be installed with pip.

pip install creme

Quick example

In the following snippet we'll be fitting an online logistic regression. The weights of the model will be optimized with the AdaGrad algorithm. We'll scale the data so that each variable has a mean of 0 and a standard deviation of 1. The standard scaling and the logistic regression are combined into a pipeline using the | operator. We'll be using the stream.iter_sklearn_dataset function for streaming over the Wisconsin breast cancer dataset. We'll measure the F1-score using progressive validation.

>>> from creme import compose
>>> from creme import linear_model
>>> from creme import metrics
>>> from creme import model_selection
>>> from creme import optim
>>> from creme import preprocessing
>>> from creme import stream
>>> from sklearn import datasets

>>> X_y = stream.iter_sklearn_dataset(
...     load_dataset=datasets.load_breast_cancer,
...     shuffle=True,
...     random_state=42
... )

>>> scaler = preprocessing.StandardScaler()
>>> lin_reg = linear_model.LogisticRegression(optimizer=optim.AdaGrad())
>>> model = scaler | lin_reg

>>> metric = metrics.F1Score()

>>> for x, y in X_y:
...     y_pred = model.predict_one(x)
...     model = model.fit_one(x, y)
...     metric = metric.update(y, y_pred)

>>> metric
F1Score: 0.97191

Comparison with other solutions

  • scikit-learn: Some of it's estimators have a partial_fit method which allows them to update themselves with new observations. However, online learning isn't treated as a first class citizen, which can make things awkward. You should definitely use scikit-learn if your data fits in memory and that you can afford retraining your model from scratch every time new data is available.
  • Vowpal Wabbit: VW is probably the fastest out-of-core learning system available. At it's core it implements a state-of-the-art adaptive gradient descent algorithm with many tricks. It also has some mechanisms for doing active learning and using bandits. However it isn't a "true" online learning system as it assumes the data is available in a file and can be looped over multiple times. Also it is somewhat difficult to grok for newcomers.
  • LIBOL: This is a good library written by academics with some great documentation. It's written in C++ and seems to be pretty fast. However it only focuses on the learning aspect of online learning, not on other mundane yet useful tasks such as feature extraction and preprocessing. Moreover it hasn't been updated for a few years.
  • Spark Streaming: This is an extension of Apache Spark which caters to big data practitioners. It processes data in mini-batches instead of actually doing real streaming operations. It also has some compatibility with the MLlib for implementing online learning algorithms, such as streaming linear regression and streaming k-means. However it is a somewhat overwhelming solution which might be a bit overkill for certain use cases.
  • TensorFlow: Deep learning systems are in some sense online learning systems because they use online gradient descent. However, popular libraries are mostly attuned to batch situations. Because frameworks such as Keras and PyTorch are so popular and very well backed, there is no real point in implementing neural networks in creme. Additionally, for a lot of problems neural networks might not be the right tool, and you might want to use a simple logistic regression or a decision tree (for which online algorithms exist).

Feel free to open an issue if you feel like other solutions are worth mentioning.

Contributing

Like many subfields of machine learning, online learning is far from being an exact science and so there is still a lot to do. Feel free to contribute in any way you like, we're always open to new ideas and approaches. Also take a look at the issue tracker and see if anything takes your fancy.

Last but not least you are more than welcome to share with us how you're using creme or online learning in general! We believe that online learning solves a lot of pain points in practice and we would love to share experiences.

License

See the license file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

creme-0.1.0.tar.gz (68.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

creme-0.1.0-py2.py3-none-any.whl (126.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file creme-0.1.0.tar.gz.

File metadata

  • Download URL: creme-0.1.0.tar.gz
  • Upload date:
  • Size: 68.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for creme-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b88308f0c7491c898c8a227e19879ee95cf5a8ed8c286cd6382d5b9bcff54a7e
MD5 e9e57ad2c83abe4d2aa322e622b5f8cf
BLAKE2b-256 9ff017d56a69909a2ccdafae14c58e3e214bda0aa9c007078f0951a50b64026b

See more details on using hashes here.

File details

Details for the file creme-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: creme-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 126.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for creme-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b9c46023c6444954deb9a754a85b6d61ea3a738a4e3c5b85c1e3b89de04fb0fb
MD5 4c7352a73a67d806648e403e11987085
BLAKE2b-256 94b15b3f46f8770c2751eeeeb654ec27546ad4f0fdf287fb1155a29733b1feb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page