Skip to main content

Mixed Effects Random Forest

Project description

# Mixed Effects Random Forest

[![Build Status](https://semaphoreci.com/api/v1/manifoldai/merf/branches/master/badge.svg)](https://semaphoreci.com/manifoldai/merf)

This repository contains a pure Python implementation of a mixed effects random forest (MERF) algorithm. It can be used, out of the box, to fit a MERF model and predict with it.

## MERF Model

The MERF model is:

y_i = f(X_i) + Z_i * b_i + e_i

b_i ~ N(0, D)

e_i ~ N(0, R_i)

for each cluster i out of n total clusters.

In the above:

  • y_i – the (n_i x 1) vector of responses for cluster i. These are given at at training.

  • X_i – the (n_i x p) fixed effects covariates that are associated with the y_i. These are given at training.

  • Z_i – the (n_i x q) random effects covariates that are associated with the y_i. These are given at training.

  • e_i – the (n_i x 1) vector of errors for cluster i. This is unknown.

  • i is the cluster_id. This is given at training.

The learned parameters in MERF are: * f() – which is a random forest that models the, potentially nonlinear, mapping from the fixed effect covariates to the response. It is common across all clusters. * D – which is the covariance of the normal distribution from which each of the b_i are drawn. It is common across all clusters. * sigma^2 – which is the variance of e_i, which is assumed to be white. It is common across all clusters.

Note that one key assumption of the MERF model is that the random effect is linear. Though, this is limiting in some regards, it is still broadly useful for many problems. It is better than not modelling the random effect at all.

The algorithms implemented in this repo were developed by Ahlem Hajjem, Francois Bellavance, and Denis Larocque and published in a paper [here](http://www.tandfonline.com/doi/abs/10.1080/00949655.2012.741599). Many thanks to Ahlem and Denis for providing an R reference and aiding in the debugging of this code. Quick note, the published paper has a small typo in the update equation for sigma^2 which is corrected in the source code here.

## Using the Code

The MERF code is modelled after scikit-learn estimators. To use, you instantiate a MERF object (with or without specifying parameters – the defaults are sensible). Then you fit the model using training data. After fitting you can predict responses from data, either from known (cluster in training set) or new (cluster not in training set) clusters.

For example:

` > from merf import MERF > merf = MERF() > merf.fit(X_train, Z_train, clusters_train, y_train) > y_hat = merf.predict(X_test, Z_test, clusters_test) `

Note that training is slow because the underlying expectation-maximization (EM) algorithm requires many calls to the random forest fit method. That being said, this implemtataion has early stopping which aborts the EM algorithm if the generalized log-likelihood (GLL) stops significantly improving.

In its current implementation the fixed effects learner is a random fores, but in theory the EM algorithm can be used with any learner. Our hope is to have future releases that do the same with gradient boosted trees and even deep neural networks.

## Tour of the Source Code

The src directory contains all the source code:

  • merf.py is the key module that contains the MERF class. It is imported at the package level.

  • tests.py contain some simple unit tests.

  • utils.py contains a class for generating synthetic data that can be used to test the accuracy of MERF. The process implemented is the same as that in this [paper](http://www.tandfonline.com/doi/abs/10.1080/00949655.2012.741599).

The notebooks directory contains some useful notebooks that show you how to use the code and evaluate MERF performance. Most of the techniques implemented are the same as those in this [paper](http://www.tandfonline.com/doi/abs/10.1080/00949655.2012.741599)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merf-0.2.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merf-0.2-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file merf-0.2.tar.gz.

File metadata

  • Download URL: merf-0.2.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for merf-0.2.tar.gz
Algorithm Hash digest
SHA256 fc1107ec5990ebba1f64f0c6a6462ea90f1f3650f867d13268c12297e4beb121
MD5 19400a9271c2065a1807a014d47848de
BLAKE2b-256 31f4ae744e2b5866c8777b384f3bc41d3cce089667107dc2aec9163377e0e36a

See more details on using hashes here.

File details

Details for the file merf-0.2-py3-none-any.whl.

File metadata

  • Download URL: merf-0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for merf-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8311d113cde3209816a285aaae1a75d72287b67accaae58bcc8fe394f4ed790a
MD5 66b8d07d0c78668797a52f12445716b0
BLAKE2b-256 3484c4817aa6d32b84c2afc3f5288178c6f651a2154e434256e939b4457bc756

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page