Skip to main content

Python wrapper for the Lolo machine learning library

Project description

Python Wrapper for Lolo
=======================

``lolopy`` implements a Python interface to the `Lolo machine learning
library <https://github.com/CitrineInformatics/lolo>`__.

Lolo is a Scala library that contains a variety of machine learning
algorithms, with a particular focus on algorithms that provide robust
uncertainty estimates. ``lolopy`` gives access to these algorithms as
scikit-learn compatible interfaces and automatically manages the
interface between Python and the JVM (i.e., you can use ``lolopy``
without knowing that it is running on the JVM)

Installation
------------

``lolopy`` is available on PyPi. Install it by calling:

``pip install lolopy``

To use ``lolopy``, you will also need to install Java JRE >= 1.8 on your
system. The ``lolopy`` PyPi package contains the compiled ``lolo``
library, so it is ready to use after installation.

Development
~~~~~~~~~~~

Lolopy requires Python >= 3.6, Java JDK >= 1.8, and Maven to be
installed on your system when developing lolopy.

Before developing ``lolopy``, compile ``lolo`` on your system using
Maven. We have provided a ``Makefile`` that contains the needed
operations. To build and install ``lolopy`` call ``make`` in this
directory.

Use
---

The ``RandomForestRegressor`` class most clearly demonstrates the use of
``lolopy``. This class is based on the `Random Forest with
Jackknife-based uncertainty estimates of Wagner et
al <http://jmlr.org/papers/volume15/wager14a/wager14a.pdf>`__, which -
in effect - uses the variance between different trees in the forest to
produce estimates of the uncertainty of each prediction. Using this
algorithm is as simple as using the `RandomForestRegressor from
scikit-learn <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html>`__:

.. code:: python

from lolopy.learners import RandomForestRegressor

rf = RandomForestRegressor()
rf.fit(X, y)
y_pred, y_std = rf.predict(X, return_std=True)

The results of this code is to produce the predicted values (``y_pred``)
and their uncertainties (``y_std``).

See the ```examples`` <./examples>`__ folder for more examples and
details.

You may need to increase the amount of memory available to ``lolopy``
when using it on larger dataset sizes. Setting the maximum memory
footprint for the JVM running the machine learning calculations can be
achieved by setting the ``LOLOPY_JVM_MEMORY`` environment variable. The
value for ``LOLOPY_JVM_MEMORY`` is used to set the maximum heap size for
the JVM (see `Oracle's documentation for
details <https://docs.oracle.com/cd/E21764_01/web.1111/e13814/jvm_tuning.htm#PERFM164>`__).
For example, "4g" allows ``lolo`` to use 4GB of memory.

Implementation and Performance
------------------------------

``lolopy`` is built using the `Py4J <https://www.py4j.org/>`__ library
to interface with the Lolo scala library. Py4J provides the ability to
easily managing a JVM server, create Java objects in that JVM, and call
Java methods from Python. However, Py4J `has slow performance in
transfering large
arrays <https://github.com/bartdag/py4j/issues/159>`__. To transfer
arrays of features (e.g., training data) to the JVM before model
training or evaluation, we transform the data to/from Byte arrays on the
Java and Python sides. Transfering data as byte arrays does allow for
quickly moving data between the JVM and Python but requires holding 3
copies of the data in memory at once (Python, Java Byte array, and Java
numerical array). We could reduce memory usage by passing the byte array
in chunks, but this is currently not implemented.

Our performance for model training is comparable to scikit-learn, as
shown in the figure below. The blue-shaded region in the figure
represents the time required to pass training data to the JVM. We note
that training times are equivalent between using the Scala interface to
Lolo and ``lolopy`` for training set sizes above 100.

.. figure:: ./examples/profile/training-performance.png
:alt: training performance

training performance
Lolopy and lolo are currently slower than scikit-learn for model
evaluation, as shown in the figure below. The model timings are
evaluated on a dataset size of 1000 with 145 features. The decrease in
model performance with training set size is an effect of the number of
trees in the forest being equal to the training set size. Lolopy and
lolo have similar performance for models with training set sizes of
above 100. Below a training set size of 100, the cost of sending data
limits the performance of ``lolopy``.

.. figure:: ./examples/profile/evaluation-performance.png
:alt: evaluation performance

evaluation performance
For more details, see the `benchmarking
notebook <./examples/profile/scaling-test.ipynb>`__.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lolopy-1.0.1.tar.gz (51.4 MB view details)

Uploaded Source

Built Distribution

lolopy-1.0.1-py2.py3-none-any.whl (51.4 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file lolopy-1.0.1.tar.gz.

File metadata

  • Download URL: lolopy-1.0.1.tar.gz
  • Upload date:
  • Size: 51.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for lolopy-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fa4f94edaaa431e43d56a9ceae1ee2a1e60880ccdc36c4e44baeee7aff72fe3b
MD5 f689d437d30f22821d3966b01f1a119f
BLAKE2b-256 a3ca139a27fb9c3cf92cd14199aefeeec1ff6913d4c7209967a0c615b08c8ca7

See more details on using hashes here.

File details

Details for the file lolopy-1.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: lolopy-1.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 51.4 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for lolopy-1.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 553706a31d7413c0a118a61b078e77a1c7c6c315f6674117f0cc83ba61438efa
MD5 2efebd910ddfe32c650b5b747258c222
BLAKE2b-256 b57a34e5afb246c836f5f184cc7f3953d0a1d8a621539df508bcb414160203af

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page