Skip to main content

Python wrapper for the Lolo machine learning library

Project description

lolopy implements a Python interface to the Lolo machine learning library.

Lolo is a Scala library that contains a variety of machine learning algorithms, with a particular focus on algorithms that provide robust uncertainty estimates. lolopy gives access to these algorithms as scikit-learn compatible interfaces and automatically manages the interface between Python and the JVM (i.e., you can use lolopy without knowing that it is running on the JVM)

Installation

lolopy is available on PyPi. Install it by calling:

pip install lolopy

To use lolopy, you will also need to install Java JRE >= 1.8 on your system. The lolopy PyPi package contains the compiled lolo library, so it is ready to use after installation.

Development

Lolopy requires Python >= 3.7, Java JDK >= 1.8, and sbt to be installed on your system when developing lolopy.

Before developing lolopy, compile lolo on your system using sbt. We have provided a Makefile that contains the needed operations. To build and install lolopy call make in this directory.

Use

The RandomForestRegressor class most clearly demonstrates the use of lolopy. This class is based on the Random Forest with Jackknife-based uncertainty estimates of Wagner et al, which - in effect - uses the variance between different trees in the forest to produce estimates of the uncertainty of each prediction. Using this algorithm is as simple as using the RandomForestRegressor from scikit-learn:

from lolopy.learners import RandomForestRegressor

rf = RandomForestRegressor()
rf.fit(X, y)
y_pred, y_std = rf.predict(X, return_std=True)

The results of this code is to produce the predicted values (y_pred) and their uncertainties (y_std).

See the `examples <./examples>`__ folder for more examples and details.

You may need to increase the amount of memory available to lolopy when using it on larger dataset sizes. Setting the maximum memory footprint for the JVM running the machine learning calculations can be achieved by setting the LOLOPY_JVM_MEMORY environment variable. The value for LOLOPY_JVM_MEMORY is used to set the maximum heap size for the JVM (see Oracle’s documentation for details). For example, “4g” allows lolo to use 4GB of memory.

Implementation and Performance

lolopy is built using the Py4J library to interface with the Lolo scala library. Py4J provides the ability to easily managing a JVM server, create Java objects in that JVM, and call Java methods from Python. However, Py4J has slow performance in transfering large arrays. To transfer arrays of features (e.g., training data) to the JVM before model training or evaluation, we transform the data to/from Byte arrays on the Java and Python sides. Transfering data as byte arrays does allow for quickly moving data between the JVM and Python but requires holding 3 copies of the data in memory at once (Python, Java Byte array, and Java numerical array). We could reduce memory usage by passing the byte array in chunks, but this is currently not implemented.

Our performance for model training is comparable to scikit-learn, as shown in the figure below. The blue-shaded region in the figure represents the time required to pass training data to the JVM. We note that training times are equivalent between using the Scala interface to Lolo and lolopy for training set sizes above 100.

training performance

training performance

Lolopy and lolo are currently slower than scikit-learn for model evaluation, as shown in the figure below. The model timings are evaluated on a dataset size of 1000 with 145 features. The decrease in model performance with training set size is an effect of the number of trees in the forest being equal to the training set size. Lolopy and lolo have similar performance for models with training set sizes of above 100. Below a training set size of 100, the cost of sending data limits the performance of lolopy.

evaluation performance

evaluation performance

For more details, see the benchmarking notebook.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lolopy-1.3.0.tar.gz (56.0 MB view details)

Uploaded Source

Built Distribution

lolopy-1.3.0-py2.py3-none-any.whl (56.0 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file lolopy-1.3.0.tar.gz.

File metadata

  • Download URL: lolopy-1.3.0.tar.gz
  • Upload date:
  • Size: 56.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11

File hashes

Hashes for lolopy-1.3.0.tar.gz
Algorithm Hash digest
SHA256 1becb33f1653e4f1e42624edba8c0d7bce6d6fc25b69720e7ab45f7d37980fe9
MD5 dd806cc17299c9343483a4f1b69b94e2
BLAKE2b-256 d82b9a85c56a3ba6b5317247a8014bf2e47f065eb2849586ddd839dd447c28d3

See more details on using hashes here.

File details

Details for the file lolopy-1.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: lolopy-1.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 56.0 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11

File hashes

Hashes for lolopy-1.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 371105b597b2c46072f0cc8716c39091bf39802e25544b4e7b89243faaf4352d
MD5 67b8c37132cf8bdcbc238f9cb247a749
BLAKE2b-256 ecfa268c93e9747eb6196a3e55d26e36b69997b4be9758a7c03b2aa1c7e83416

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page