Skip to main content

Incremental machine learning in Python

Project description

creme_logo

creme is a library for online machine learning, also known as incremental learning. Online learning is a machine learning regime where a model learns one observation at a time. This is in contrast to batch learning where all the data is processed in one go. Incremental learning is desirable when the data is too big to fit in memory, or simply when you want to handle streaming data. In addition to many online machine learning algorithms, creme provides utilities for extracting features from a stream of data.

Useful links

Installation

:point_up: creme is intended to work with Python 3.6 and above.

creme can simply be installed with pip.

pip install creme

You can also install the bleeding edge version as so:

pip install git+https://github.com/creme-ml/creme
# Or through SSH:
pip install git+ssh://git@github.com/creme-ml/creme.git

If you're looking to contribute to creme and want to have a development setup, then please check out the contribution guidelines.

Example

In the following example we'll use a linear regression to forecast the number of available bikes in bike stations from the city of Toulouse. Each observation looks like this:

>>> import pprint
>>> from creme import datasets

>>> X_y = datasets.fetch_bikes()
>>> x, y = next(X_y)

>>> pprint.pprint(x)
{'clouds': 75,
 'description': 'light rain',
 'humidity': 81,
 'moment': datetime.datetime(2016, 4, 1, 0, 0, 7),
 'pressure': 1017.0,
 'station': 'metro-canal-du-midi',
 'temperature': 6.54,
 'wind': 9.3}

>>> print(f'Number of bikes: {y}')
Number of bikes: 1

We will include all the available numeric features in our model. We will also use target encoding by calculating a running average of the target per station and hour. Before being fed to the linear regression, the features will be scaled using a StandardScaler. Note that each of these steps works in a streaming fashion, including the feature extraction. We'll evaluate the model by asking it to forecast 30 minutes ahead while delaying the true answers, which ensures that we're simulating a production scenario. Finally we will print the current score every 20,000 predictions.

>>> import datetime as dt
>>> from creme import compose
>>> from creme import datasets
>>> from creme import feature_extraction
>>> from creme import linear_model
>>> from creme import metrics
>>> from creme import model_selection
>>> from creme import preprocessing
>>> from creme import stats

>>> X_y = datasets.fetch_bikes()

>>> def add_hour(x):
...     x['hour'] = x['moment'].hour
...     return x

>>> model = compose.Whitelister('clouds', 'humidity', 'pressure', 'temperature', 'wind')
>>> model += (
...     add_hour |
...     feature_extraction.TargetAgg(by=['station', 'hour'], how=stats.Mean())
... )
>>> model += feature_extraction.TargetAgg(by='station', how=stats.EWMean(0.5))
>>> model |= preprocessing.StandardScaler()
>>> model |= linear_model.LinearRegression()

>>> model_selection.online_qa_score(
...     X_y=X_y,
...     model=model,
...     metric=metrics.MAE(),
...     on='moment',
...     lag=dt.timedelta(minutes=30),
...     print_every=30_000
... )
[30,000] MAE: 2.193069
[60,000] MAE: 2.249345
[90,000] MAE: 2.288321
[120,000] MAE: 2.265257
[150,000] MAE: 2.2674
[180,000] MAE: 2.282485
MAE: 2.285921

You can visualize the pipeline as so:

>>> model
Pipeline (
    TransformerUnion (
        Whitelister (
            whitelist=['clouds', 'humidity', 'pressure', 'temperature', 'wind']
        ),
        Pipeline (
            FuncTransformer (
                func=add_hour
            ),
            TargetAgg (
                by=['station', 'hour']
                how=Mean: 0.
                target_name='target'
            )
        ),
        TargetAgg (
            by=['station']
            how=EWMean: 0.
            target_name='target'
        )
    ),
    StandardScaler (),
    LinearRegression (
        optimizer=SGD
        loss=Squared
        l2=0.0001
        intercept=0.0
        intercept_lr=0.01
    )
)

We can also draw the pipeline.

>>> dot = model.draw()
bikes_pipeline

By only using a few lines of code, we've built a robust model and evaluated it by simulating a production scenario. You can find a more detailed version of this example here. creme is a framework that has a lot to offer, and as such we kindly refer you to the documentation if you want to know more.

Contributing

Like many subfields of machine learning, online learning is far from being an exact science and so there is still a lot to do. Feel free to contribute in any way you like, we're always open to new ideas and approaches. If you want to contribute to the code base please check out the CONTRIBUTING.md file. Also take a look at the issue tracker and see if anything takes your fancy.

Last but not least you are more than welcome to share with us on how you're using creme or online learning in general! We believe that online learning solves a lot of pain points in practice, and would love to share experiences.

This project follows the all-contributors specification. Contributions of any kind are welcome!

Max Halford
Max Halford

📆 💻
AdilZouitine
AdilZouitine

💻
Raphael Sourty
Raphael Sourty

💻
Geoffrey Bolmier
Geoffrey Bolmier

💻
vincent d warmerdam
vincent d warmerdam

💻
VaysseRobin
VaysseRobin

💻
Lygon Bowen-West
Lygon Bowen-West

💻
Florent Le Gac
Florent Le Gac

💻
Adrian Rosebrock
Adrian Rosebrock

📝

License

See the license file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

creme-0.4.3.tar.gz (320.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

creme-0.4.3-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7m

creme-0.4.3-cp37-cp37m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.7m

creme-0.4.3-cp37-cp37m-macosx_10_6_intel.whl (730.3 kB view details)

Uploaded CPython 3.7mmacOS 10.6+ Intel (x86-64, i386)

creme-0.4.3-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m

creme-0.4.3-cp36-cp36m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.6m

creme-0.4.3-cp36-cp36m-macosx_10_6_intel.whl (735.7 kB view details)

Uploaded CPython 3.6mmacOS 10.6+ Intel (x86-64, i386)

File details

Details for the file creme-0.4.3.tar.gz.

File metadata

  • Download URL: creme-0.4.3.tar.gz
  • Upload date:
  • Size: 320.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for creme-0.4.3.tar.gz
Algorithm Hash digest
SHA256 a14b12fe6281ca6ba09765ab77d9bc9263eb067c1acfab579ca197dbc9856992
MD5 881a167019e6291a8ecd572f5d268abe
BLAKE2b-256 8667b4575b700f6015596a575be61562be871aff16ce15eab5f7a3f5374d9859

See more details on using hashes here.

File details

Details for the file creme-0.4.3-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: creme-0.4.3-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for creme-0.4.3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 beb5040276aa0db27aa349caa0cc9def977f119f23be677a458c63b058f0a9a2
MD5 403b647830807860caf2a979cec6a768
BLAKE2b-256 135bb19cdbdb4d6c7e312d3f50b27a21344deb194996ea8b455eb4e09fd76b50

See more details on using hashes here.

File details

Details for the file creme-0.4.3-cp37-cp37m-manylinux1_i686.whl.

File metadata

  • Download URL: creme-0.4.3-cp37-cp37m-manylinux1_i686.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for creme-0.4.3-cp37-cp37m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 37eb11bffc0178f08d471b8776c238f65448c4da3765276731b1d95fa7facc57
MD5 3a95c2428e008c20b2f3038355b897ef
BLAKE2b-256 822f4df466be8157ba5b9be72fd2540c253bbdb7d154be2fac47d644f27d7f8a

See more details on using hashes here.

File details

Details for the file creme-0.4.3-cp37-cp37m-macosx_10_6_intel.whl.

File metadata

  • Download URL: creme-0.4.3-cp37-cp37m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 730.3 kB
  • Tags: CPython 3.7m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/2.7.15

File hashes

Hashes for creme-0.4.3-cp37-cp37m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 69c2cd0b35b452fc23aa0643a2b4677780404a97ab753888f769a1ad82c99529
MD5 6e10ea737b01b670818ffd51670d67f3
BLAKE2b-256 4f40c4520712565b4fb45005c0e2e3fc9e6093b4f48aa4cb900c8a52771d1521

See more details on using hashes here.

File details

Details for the file creme-0.4.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: creme-0.4.3-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for creme-0.4.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7b534a1b09dbae3ca163ec4dd247ed8ab8b970733cbae0c217f61882aaf6399a
MD5 18995c4348dddd6a9e34e6b54d73da06
BLAKE2b-256 b80ed7a3384a81f25f0db17480a2257f96db8cce5ba0bbcad822508a25426195

See more details on using hashes here.

File details

Details for the file creme-0.4.3-cp36-cp36m-manylinux1_i686.whl.

File metadata

  • Download URL: creme-0.4.3-cp36-cp36m-manylinux1_i686.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for creme-0.4.3-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 427e8d383172366e1e6506d216630d1fd850ae11a1c07b600556c8480765818f
MD5 a0b058390f29641d7d04b855eb3bf0ae
BLAKE2b-256 39071aa407fd0b3dca573e12e65f1641893cbce443f731711b36941c7ee0b9f8

See more details on using hashes here.

File details

Details for the file creme-0.4.3-cp36-cp36m-macosx_10_6_intel.whl.

File metadata

  • Download URL: creme-0.4.3-cp36-cp36m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 735.7 kB
  • Tags: CPython 3.6m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/2.7.15

File hashes

Hashes for creme-0.4.3-cp36-cp36m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 52c7a47d222c6a4733563b6c7af472fa07c8e561d867c993b1bb11d9dd108ba6
MD5 f717bc494628106ae95a530af87eda80
BLAKE2b-256 83463f490e72f4e090f6d778cf389f74af431a0ae6ce4d60ace2c622a4d6fa1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page