Skip to main content

Online machine learning in Python

Project description


river_logo


tests documentation roadmap pypi pepy bsd_3_license


River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.

⚡️ Quickstart

As a quick example, we'll train a logistic regression to classify the website phishing dataset. Here's a look at the first observation in the dataset.

>>> from pprint import pprint
>>> from river import datasets

>>> dataset = datasets.Phishing()

>>> for x, y in dataset:
...     pprint(x)
...     print(y)
...     break
{'age_of_domain': 1,
 'anchor_from_other_domain': 0.0,
 'empty_server_form_handler': 0.0,
 'https': 0.0,
 'ip_in_url': 1,
 'is_popular': 0.5,
 'long_url': 1.0,
 'popup_window': 0.0,
 'request_from_other_domain': 0.0}
True

Now let's run the model on the dataset in a streaming fashion. We sequentially interleave predictions and model updates. Meanwhile, we update a performance metric to see how well the model is doing.

>>> from river import compose
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing

>>> model = compose.Pipeline(
...     preprocessing.StandardScaler(),
...     linear_model.LogisticRegression()
... )

>>> metric = metrics.Accuracy()

>>> for x, y in dataset:
...     y_pred = model.predict_one(x)      # make a prediction
...     metric = metric.update(y, y_pred)  # update the metric
...     model = model.learn_one(x, y)      # make the model learn

>>> metric
Accuracy: 89.20%

🛠 Installation

River is intended to work with Python 3.6 or above. Installation can be done with pip:

pip install river

There are wheels available for Linux, MacOS, and Windows, which means that you most probably won't have to build River from source.

You can install the latest development version from GitHub as so:

pip install git+https://github.com/online-ml/river --upgrade

Or, through SSH:

pip install git+ssh://git@github.com/online-ml/river.git --upgrade

🧠 Philosophy

Machine learning is often done in a batch setting, whereby a model is fitted to a dataset in one go. This results in a static model which has to be retrained in order to learn from new data. In many cases, this isn't elegant nor efficient, and usually incurs a fair amount of technical debt. Indeed, if you're using a batch model, then you need to think about maintaining a training set, monitoring real-time performance, model retraining, etc.

With River, we encourage a different approach, which is to continuously learn a stream of data. This means that the model process one observation at a time, and can therefore be updated on the fly. This allows to learn from massive datasets that don't fit in main memory. Online machine learning also integrates nicely in cases where new data is constantly arriving. It shines in many use cases, such as time series forecasting, spam filtering, recommender systems, CTR prediction, and IoT applications. If you're bored with retraining models and want to instead build dynamic models, then online machine learning (and therefore River!) might be what you're looking for.

Here are some benefits of using River (and online machine learning in general):

  • Incremental: models can update themselves in real-time.
  • Adaptive: models can adapt to concept drift.
  • Production-ready: working with data streams makes it simple to replicate production scenarios during model development.
  • Efficient: models don't have to be retrained and require little compute power, which lowers their carbon footprint
  • Fast: when the goal is to learn and predict with a single instance at a time, then River is an order of magnitude faster than PyTorch, Tensorflow, and scikit-learn.

🔥 Features

  • Linear models with a wide array of optimizers
  • Nearest neighbors, decision trees, naïve Bayes
  • Progressive model validation
  • Model pipelines as a first-class citizen
  • Anomaly detection
  • Recommender systems
  • Time series forecasting
  • Imbalanced learning
  • Clustering
  • Feature extraction and selection
  • Online statistics and metrics
  • Built-in datasets
  • And much more

🔗 Useful links

👁️ Media

👍 Contributing

Feel free to contribute in any way you like, we're always open to new ideas and approaches.

There are three ways for users to get involved:

  • Issue tracker: this place is meant to report bugs, request for minor features, or small improvements. Issues should be short-lived and solved as fast as possible.
  • Discussions: you can ask for new features, submit your questions and get help, propose new ideas, or even show the community what you are achieving with River! If you have a new technique or want to port a new functionality to River, this is the place to discuss.
  • Roadmap: you can check what we are doing, what are the next planned milestones for River, and look for cool ideas that still need someone to make them become a reality!

Please check out the contribution guidelines if you want to bring modifications to the code base. You can view the list of people who have contributed here.

❤️ They've used us

These are companies that we know have been using River, be it in production or for prototyping.

companies

Feel welcome to get in touch if you want us to add your company logo!

🤝 Affiliations

Sponsors

sponsors

Collaborating institutions and groups

collaborations

💬 Citation

If river has been useful for your research and you would like to cite it in an scientific publication, please refer to this paper:

@misc{2020river,
      title={River: machine learning for streaming data in Python},
      author={Jacob Montiel and Max Halford and Saulo Martiello Mastelini
              and Geoffrey Bolmier and Raphael Sourty and Robin Vaysse
              and Adil Zouitine and Heitor Murilo Gomes and Jesse Read
              and Talel Abdessalem and Albert Bifet},
      year={2020},
      eprint={2012.04740},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

📝 License

River is free and open-source software licensed under the 3-clause BSD license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

river-0.8.0.tar.gz (938.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

river-0.8.0-cp38-cp38-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.8Windows x86-64

river-0.8.0-cp38-cp38-win32.whl (1.3 MB view details)

Uploaded CPython 3.8Windows x86

river-0.8.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

river-0.8.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl (2.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ i686manylinux: glibc 2.5+ i686

river-0.8.0-cp38-cp38-macosx_10_9_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

river-0.8.0-cp37-cp37m-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.7mWindows x86-64

river-0.8.0-cp37-cp37m-win32.whl (1.3 MB view details)

Uploaded CPython 3.7mWindows x86

river-0.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

river-0.8.0-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl (2.3 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ i686manylinux: glibc 2.5+ i686

river-0.8.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

river-0.8.0-cp36-cp36m-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.6mWindows x86-64

river-0.8.0-cp36-cp36m-win32.whl (1.3 MB view details)

Uploaded CPython 3.6mWindows x86

river-0.8.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

river-0.8.0-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl (2.3 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ i686manylinux: glibc 2.5+ i686

river-0.8.0-cp36-cp36m-macosx_10_9_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file river-0.8.0.tar.gz.

File metadata

  • Download URL: river-0.8.0.tar.gz
  • Upload date:
  • Size: 938.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.11

File hashes

Hashes for river-0.8.0.tar.gz
Algorithm Hash digest
SHA256 f53b00c89cee2528229990d9c24ff43db82567e8b35a541177809a304184d608
MD5 90a54a0de4e75986e7863c705c7c064f
BLAKE2b-256 726ce2164b57802af0b7e60a4ac271a1277876a720eb88cde88c2aa4a7aa24c1

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: river-0.8.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for river-0.8.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a2fac8c71f32021a0795f50cf1f88bc7291169bf3e1dd400d3f7201be8a96b17
MD5 c273f369ac5a7b7105af3d3473ed3344
BLAKE2b-256 39ed82db0342b9756cc378f9f483f35758999854bb09106d3e84cc0929078d4f

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp38-cp38-win32.whl.

File metadata

  • Download URL: river-0.8.0-cp38-cp38-win32.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for river-0.8.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 b460480a433f889b95eed0109eb3b59ae92f98da77f6c986242a17a5280be127
MD5 dac708ee7bcc74fc7d288736d4387a90
BLAKE2b-256 e300ba4f0c587dacae7bc2501bac682d9f340a95ce3c8db25f44ce7931cfc80a

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for river-0.8.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 80f1f10c7df66ac7fefa5ff8a530346db873ef08262fe5f74d05dd64690b5bbe
MD5 9473313fd777f4b17aa616c49b5ecd33
BLAKE2b-256 f047f28ada4c4cbbcc8011fa124afafb5ce9d154ff64096f2e20a35122143e4d

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for river-0.8.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 05df80965dcd268e0b30f5e02f21c5bedde5e34cfaa78170cc6160241de98cd7
MD5 f3bb60b11ae86b7dccfebb367b430e44
BLAKE2b-256 ce61274235a20eb5c8f6043269f6b989340abc662007c1c6f4e7ef78bd4c16fb

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: river-0.8.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.11

File hashes

Hashes for river-0.8.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e5dfdae77b73e0bf5167a25c3e181e3353634bf34471b67cd216fbc6503da5cf
MD5 de2297b2de18a9c78d5e25acc57cfb79
BLAKE2b-256 0985b8dc99937125b099c4796be1d38ed0cfec96be654164758f2a25c5cf5d24

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: river-0.8.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for river-0.8.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 f34952b5b254f99fbbb7f8d939debba7f533d6584b50d197818965059c6708cc
MD5 d7f06b2aaaf0fe6428352deaa9f9c8e8
BLAKE2b-256 c08d47f1ed6d82b8313c35ce7f3bc56beeb3aaf3129640e65c2eb9ad11f5cdcb

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: river-0.8.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for river-0.8.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 8d9b812f2b991a43b33b0ef48a39cc421a8775b1656b9fb5aeb440efb634be7d
MD5 9e978d938f1625df7024553230a9949d
BLAKE2b-256 c8745ae2ce6edafb5410a23751f5ce839d7540e2789d770f90bcc456bb269b6c

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for river-0.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 90c7d2d491f6c106c027039f7581efc22518152d7b35a61c11a9d0375431cdd9
MD5 710082cf3585e3692af207c131100e04
BLAKE2b-256 896dfde57adfaa33dc73ad17b75d5ace4964cbca54f863780e80e32fab5e100c

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for river-0.8.0-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 5b859492d9ee74cdfbbcac4ed6c086736fb45fe34c6f53133e549850d223a910
MD5 57b0f783fbe0aa1fd08b3b0abcec8143
BLAKE2b-256 7aeb688836706b050587b75bbb20e389a1d51350726f582a2a69139a71325684

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: river-0.8.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.11

File hashes

Hashes for river-0.8.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4b0395d875e1af14a9935d40865a8664ce0ce718dcc3b1ba19e2a29497ff5ce7
MD5 03bd16f7f7b6e8b848c17a8e505b0b0d
BLAKE2b-256 feefb3d9e792e68cbeb93ea88b08e86bd5c44dad0e4e2fdbe4c68e4c4a5f4cf6

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: river-0.8.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for river-0.8.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 5eb032e51cda22843e03cdd68fde6866a35da243498a3b43d09ba090e3d9df6c
MD5 d888e97352603a0b99f7e2eaa9fb7ad2
BLAKE2b-256 95ebc7ffcd0da9e9dd73838f389946bb4956e5ad98c5fcf7e20ce0f56835f6ed

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp36-cp36m-win32.whl.

File metadata

  • Download URL: river-0.8.0-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for river-0.8.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 6037bafa5548d1576f54d5ac7403df1e085de44e29f5cc65c1ee534eedefbef1
MD5 59cb4ba0440d5aaf8749f18b06be8524
BLAKE2b-256 cf5b72265e1c0d46235a6555f84397eba780e22d33cddca10e6930b1530c019d

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for river-0.8.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a461fe3e3a7c137b6461439104675bea4f19604a1598808a86161cf74e14d8e9
MD5 63baee91e799fa9380663c797d32e710
BLAKE2b-256 147e9122045a4ff02108a8f7c85144d5a45b2bb79e639471bd794bff1f2a5811

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for river-0.8.0-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 5f6c0e12f654671c423f02906671a8b6df80d4b60d2d05c8e67b603837d1c03e
MD5 ae80fdf5d4ba7bcffb5351da84a9ee43
BLAKE2b-256 f44e7ca9ac5e2408f12730b2196e0496942a87f78fa531425a4774012a410675

See more details on using hashes here.

File details

Details for the file river-0.8.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: river-0.8.0-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.11

File hashes

Hashes for river-0.8.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 844ecd9d55512c8c6984cc4d2306e486cbad8263ae1d916fdba4d70905243997
MD5 7899d734efab47b77e1fdd012c9debd5
BLAKE2b-256 b4ec4ba826cc30cd399edfedaa94fb5bf10cf223ed412ea8fa7ab3942d1caf9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page