Skip to main content

Online machine learning in Python

Project description


river_logo


tests documentation roadmap pypi pepy bsd_3_license


River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.

⚡️ Quickstart

As a quick example, we'll train a logistic regression to classify the website phishing dataset. Here's a look at the first observation in the dataset.

>>> from pprint import pprint
>>> from river import datasets

>>> dataset = datasets.Phishing()

>>> for x, y in dataset:
...     pprint(x)
...     print(y)
...     break
{'age_of_domain': 1,
 'anchor_from_other_domain': 0.0,
 'empty_server_form_handler': 0.0,
 'https': 0.0,
 'ip_in_url': 1,
 'is_popular': 0.5,
 'long_url': 1.0,
 'popup_window': 0.0,
 'request_from_other_domain': 0.0}
True

Now let's run the model on the dataset in a streaming fashion. We sequentially interleave predictions and model updates. Meanwhile, we update a performance metric to see how well the model is doing.

>>> from river import compose
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing

>>> model = compose.Pipeline(
...     preprocessing.StandardScaler(),
...     linear_model.LogisticRegression()
... )

>>> metric = metrics.Accuracy()

>>> for x, y in dataset:
...     y_pred = model.predict_one(x)      # make a prediction
...     metric = metric.update(y, y_pred)  # update the metric
...     model = model.learn_one(x, y)      # make the model learn

>>> metric
Accuracy: 89.20%

🛠 Installation

River is intended to work with Python 3.6 or above. Installation can be done with pip:

pip install river

There are wheels available for Linux, MacOS, and Windows, which means that you most probably won't have to build River from source.

You can install the latest development version from GitHub as so:

pip install git+https://github.com/online-ml/river --upgrade

Or, through SSH:

pip install git+ssh://git@github.com/online-ml/river.git --upgrade

🧠 Philosophy

Machine learning is often done in a batch setting, whereby a model is fitted to a dataset in one go. This results in a static model which has to be retrained in order to learn from new data. In many cases, this isn't elegant nor efficient, and usually incurs a fair amount of technical debt. Indeed, if you're using a batch model, then you need to think about maintaining a training set, monitoring real-time performance, model retraining, etc.

With River, we encourage a different approach, which is to continuously learn a stream of data. This means that the model process one observation at a time, and can therefore be updated on the fly. This allows to learn from massive datasets that don't fit in main memory. Online machine learning also integrates nicely in cases where new data is constantly arriving. It shines in many use cases, such as time series forecasting, spam filtering, recommender systems, CTR prediction, and IoT applications. If you're bored with retraining models and want to instead build dynamic models, then online machine learning (and therefore River!) might be what you're looking for.

Here are some benefits of using River (and online machine learning in general):

  • Incremental: models can update themselves in real-time.
  • Adaptive: models can adapt to concept drift.
  • Production-ready: working with data streams makes it simple to replicate production scenarios during model development.
  • Efficient: models don't have to be retrained and require little compute power, which lowers their carbon footprint
  • Fast: when the goal is to learn and predict with a single instance at a time, then River is an order of magnitude faster than PyTorch, Tensorflow, and scikit-learn.

🔥 Features

  • Linear models with a wide array of optimizers
  • Nearest neighbors, decision trees, naïve Bayes
  • Progressive model validation
  • Model pipelines as a first-class citizen
  • Anomaly detection
  • Recommender systems
  • Time series forecasting
  • Imbalanced learning
  • Clustering
  • Feature extraction and selection
  • Online statistics and metrics
  • Built-in datasets
  • And much more

🔗 Useful links

👁️ Media

👍 Contributing

Feel free to contribute in any way you like, we're always open to new ideas and approaches.

There are three ways for users to get involved:

  • Issue tracker: this place is meant to report bugs, request for minor features, or small improvements. Issues should be short-lived and solved as fast as possible.
  • Discussions: you can ask for new features, submit your questions and get help, propose new ideas, or even show the community what you are achieving with River! If you have a new technique or want to port a new functionality to River, this is the place to discuss.
  • Roadmap: you can check what we are doing, what are the next planned milestones for River, and look for cool ideas that still need someone to make them become a reality!

Please check out the contribution guidelines if you want to bring modifications to the code base. You can view the list of people who have contributed here.

❤️ They've used us

These are companies that we know have been using River, be it in production or for prototyping.

companies

Feel welcome to get in touch if you want us to add your company logo!

🤝 Affiliations

Sponsors

sponsors

Collaborating institutions and groups

collaborations

💬 Citation

If river has been useful for your research and you would like to cite it in an scientific publication, please refer to this paper:

@misc{2020river,
      title={River: machine learning for streaming data in Python},
      author={Jacob Montiel and Max Halford and Saulo Martiello Mastelini
              and Geoffrey Bolmier and Raphael Sourty and Robin Vaysse
              and Adil Zouitine and Heitor Murilo Gomes and Jesse Read
              and Talel Abdessalem and Albert Bifet},
      year={2020},
      eprint={2012.04740},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

📝 License

River is free and open-source software licensed under the 3-clause BSD license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

river-0.7.1.tar.gz (844.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

river-0.7.1-cp38-cp38-manylinux2010_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

river-0.7.1-cp38-cp38-manylinux2010_i686.whl (2.3 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ i686

river-0.7.1-cp38-cp38-manylinux1_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.8

river-0.7.1-cp38-cp38-manylinux1_i686.whl (2.3 MB view details)

Uploaded CPython 3.8

river-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

river-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

river-0.7.1-cp37-cp37m-manylinux2010_i686.whl (2.0 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ i686

river-0.7.1-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.7m

river-0.7.1-cp37-cp37m-manylinux1_i686.whl (2.0 MB view details)

Uploaded CPython 3.7m

river-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

river-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

river-0.7.1-cp36-cp36m-manylinux2010_i686.whl (2.0 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ i686

river-0.7.1-cp36-cp36m-manylinux1_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.6m

river-0.7.1-cp36-cp36m-manylinux1_i686.whl (2.0 MB view details)

Uploaded CPython 3.6m

river-0.7.1-cp36-cp36m-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file river-0.7.1.tar.gz.

File metadata

  • Download URL: river-0.7.1.tar.gz
  • Upload date:
  • Size: 844.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1.tar.gz
Algorithm Hash digest
SHA256 c425f14a15df4557e06e3e39612f443220a336d6b6ed76577f75b26e44a14a6e
MD5 76e9cc0ee5201ef4c0c01b7b45f80eb4
BLAKE2b-256 61dc0e9c0425445c0bc3406c1060da9acfd7e67ab1a2125eb9c9850149f06a3b

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 61344bd1eaeff09b8a5b372f05dc7e7430392604f753bf77faff82d1939524e1
MD5 b064b3cc64a099c43d11c92c839996f9
BLAKE2b-256 499d6c3177c1270f8ed43499f691304cced8db2d7a719f6cd1a818370c256ac8

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp38-cp38-manylinux2010_i686.whl.

File metadata

  • Download URL: river-0.7.1-cp38-cp38-manylinux2010_i686.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp38-cp38-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 e981eda537f5616454d8c494bc992f75b31bc71a767cdec742035cda6dac53c2
MD5 e6d705465db97e5526c13ff4c77af70e
BLAKE2b-256 f1ff9862630bcc830ae4ac02c87cd8422a42ca7084ecfd5a128e4b0908d5467b

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 adae992f13e3e24e5008a5a1b91e1245cb98e49462cd5f4e02ed875327849c0f
MD5 25420a4d852ec619071d6934a0ed288d
BLAKE2b-256 14b97d8413eacf3c35879a70f2a61f40558acadfe93eeefab1b2bfa7c9cf9b57

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp38-cp38-manylinux1_i686.whl.

File metadata

  • Download URL: river-0.7.1-cp38-cp38-manylinux1_i686.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp38-cp38-manylinux1_i686.whl
Algorithm Hash digest
SHA256 4f5ecb2e60a5e0f22fa751b96892c1d439a1a74013d6deb2fbbe769b9ef4d314
MD5 6ca9e442efdaa988e1405a6f33c1f984
BLAKE2b-256 95dc4a14188f8302713d6c0650f588e0bd2123933923e49564718cc9a8502057

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7a7de08e029ccf9d796dc09f25bff81feda1d55fe02a61b3c4710eff99bde635
MD5 fd2e3e3a1a0d9abc1466d377d868dbd7
BLAKE2b-256 1dd09957eda9127627704ac6390d191a97317e9b3f74754c8b2214467ad57b9a

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a1adf22b7671cb54d7eb20872fa725212f82b3c36b94783eee6a8bbb56f0aa36
MD5 7d959121419d8431b147de46a5d4d325
BLAKE2b-256 9c6349f4ec6effc1b0ca1a9c642ed67f15bd8e72d4248e4b81e68fd081feaa47

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp37-cp37m-manylinux2010_i686.whl.

File metadata

  • Download URL: river-0.7.1-cp37-cp37m-manylinux2010_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp37-cp37m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 436db47cad2d059afcde7141bb6a76df9960803763e0f6976b5ae67e9b6791b6
MD5 e751508ba0d281ba16bba4f773ce5fc7
BLAKE2b-256 9fead08f54fbea87a649c7355a34ef2ca2387bdb50523d422b0cf05cbfd0d69e

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2b00858a1eb44fc773de1f3882d43ae010c4b9e253aa82ec6b3a63f57bae4083
MD5 eaa7958fcd52828ced703b157b88fea2
BLAKE2b-256 62386cd33985046ea7e494afa9d0f0da6db8bca32feb5856268eff057f28054a

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp37-cp37m-manylinux1_i686.whl.

File metadata

  • Download URL: river-0.7.1-cp37-cp37m-manylinux1_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp37-cp37m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 1b1793154c7951dcf1c196b33808477c755530b86ef6b1b591345b8fb3d77065
MD5 7c213c312c4b035b2652bc8dc40b4f6e
BLAKE2b-256 16abe15a6acda3312b0b5f1d9592fd6073987fc006153642cf05f6be42ea9123

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e5135912faa06031e7f29ea749f6975bb2a326464e95d13cc8e47c7993e115a7
MD5 ccb2a35f2dea5c276534e3b9f8d33347
BLAKE2b-256 a177c7c641a80cc0679b9481824d098225512e3e52049747db1f5e410a70831b

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 98f58a7257c6d20bd5c382c591cb8903295743b3fc7cd3339f15b22cd15fa8b9
MD5 ab694f3e59e57eb00aae49168045069f
BLAKE2b-256 72516ce067b76a96ca9b9d6ad33dcb833bf41c329064a75de6ea0a96cfe6fb0c

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp36-cp36m-manylinux2010_i686.whl.

File metadata

  • Download URL: river-0.7.1-cp36-cp36m-manylinux2010_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp36-cp36m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 4a521bccd90cda6e22a7a5a43090ad504de2de30853659b8707804715c71de9e
MD5 acc94ffc111ab656c208993eb67f0324
BLAKE2b-256 c685f61cfebf21a9cc0fe64f7ac86b61d4034032ecd06e4882f1eeda99522c2a

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 45999647a857e4751c1e8f2956fff07a323b4e34938575e32c1a56420d570eef
MD5 005bc76b40b3e9772476d8b8972792c9
BLAKE2b-256 e5d3de3a840be36597de09c689723f49fb0ba6c9d5410380354d2fb5cf5d0093

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp36-cp36m-manylinux1_i686.whl.

File metadata

  • Download URL: river-0.7.1-cp36-cp36m-manylinux1_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 fae15e77c971f8c0da802f808c558ca8df8873ff41628561c22d7242221b0bfd
MD5 43229b646c1151ee75784c742ce056c9
BLAKE2b-256 7de6d9baf0f7ac8ec8c2bc9886580869eb60d8685d2445f9838ea9589edd84d0

See more details on using hashes here.

File details

Details for the file river-0.7.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: river-0.7.1-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for river-0.7.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9e13d0c94e138bc5f8811b549512b60dc7260e8459a73326d63bf8ebacb74485
MD5 6ae3c338937f4b88ad72521cc959e9e4
BLAKE2b-256 a1cc6f6dd47259bcb5024e4ad09e42ce4ce755d0fd46aa9cd2dcf18c211f58b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page