Skip to main content

lrtree: logistic regression trees

Project description

PyPI version PyPI pyversions PyPi Downloads Build Status Python package codecov

Logistic regression trees

Table of Contents

Motivation

The goal of lrtree is to build decision trees with logistic regressions at their leaves, so that the resulting model mixes non parametric VS parametric and stepwise VS linear approaches to have the best predictive results, yet maintaining interpretability.

This is the implementation of glmtree as described in Formalization and study of statistical problems in Credit Scoring, Ehrhardt A. (see manuscript or web article)

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

This code is supported on Python 3.7, 3.8, 3.9.

Installing the package

Installing the development version

If git is installed on your machine, you can use:

pipenv install git+https://github.com/adimajo/lrtree.git

If git is not installed, you can also use:

pipenv install --upgrade https://github.com/adimajo/lrtree/archive/master.tar.gz

Installing through the pip command

You can install a stable version from PyPi by using:

pip install lrtree

Installation guide for Anaconda

The installation with the pip or pipenv command should work. If not, please raise an issue.

For people behind proxy(ies)...

A lot of people, including myself, work behind a proxy at work...

A simple solution to get the package is to use the --proxy option of pip:

pip --proxy=http://username:password@server:port install lrtree

where username, password, server and port should be replaced by your own values.

If environment variables http_proxy and / or https_proxy and / or (unfortunately depending on applications...) HTTP_PROXY and HTTPS_PROXY are set, the proxy settings should be picked up by pip.

Over the years, I've found CNTLM to be a great tool in this regard.

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This research has been financed by Crédit Agricole Consumer Finance through a CIFRE PhD.

This research was supported by Inria Lille - Nord-Europe and Lille University as part of a PhD.

References

Ehrhardt, A. (2019), Formalization and study of statistical problems in Credit Scoring: Reject inference, discretization and pairwise interactions, logistic regression trees (PhD thesis).

Contribute

You can clone this project using:

git clone https://github.com/adimajo/lrtree.git

You can install all dependencies, including development dependencies, using (note that this command requires pipenv which can be installed by typing pip install pipenv):

pipenv install -d

You can build the documentation by going into the docs directory and typing make html.

You can run the tests by typing coverage run -m pytest, which relies on packages coverage and pytest.

To run the tests in different environments (one for each version of Python), install pyenv (see the instructions here), install all versions you want to test (see tox.ini), e.g. with pyenv install 3.7.0 and run pipenv run pyenv local 3.7.0 [...] (and all other versions) followed by pipenv run tox.

Python Environment

The project uses pipenv. An interesting resource.

To download all the project dependencies in order to then port them to a machine that had limited access to the internet, you must use the command pipenv lock -r > requirements.txt which will transform the Pipfile into a requirements.txt.

Installation

To install a virtual environment as well as all the necessary dependencies, you must use the pipenv install command for production use or the command pipenv install -d for development use.

Tests

The tests are based on pytest and are stored in the tests folder. They can all be launched with the command pytest in at the root of the project. The test coverage can be calculated thanks to the coverage package, which is also responsible for launching the tests. The command to use is coverage run -m pytest. We can then obtain a graphic summary in the form of an HTML page using the coverage html command which creates or updates the htmlcov folder from which we can open the index.html file.

Utilization

The package provides sklearn-like interface.

Loading sample data for regression task:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

The trained model consists of a fitted sklearn.tree.DecisionTreeClassifier class for segmentation of a data and sklearn.linear_model.LogisticRegression regressions for each node a of a tree in a form of python list.

The snippet to train the model and make a prediction:

from lrtree import Lrtree

model = Lrtree(criterion="bic", ratios=(0.7,), class_num=2, max_iter=100)

# Fitting the model
model.fit(X_train, y_train)

# Make a prediction on a fitted model
model.predict(X_test)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lrtree-1.0.1.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

lrtree-1.0.1-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file lrtree-1.0.1.tar.gz.

File metadata

  • Download URL: lrtree-1.0.1.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for lrtree-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7182856765bfa6b82ed70ede5d5cf9ffc8958daa9aab614bda17e2f1b986ebcf
MD5 16d34267ee85aea0ea33bf5e99fd972d
BLAKE2b-256 fcddf3627831fd203c96218edc64225beeb77e66dd5539587e9345ded9063b4a

See more details on using hashes here.

File details

Details for the file lrtree-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: lrtree-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for lrtree-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55b59aafc94c720a8903eb7f4fe9726c293ee01e866776e1916f5ba64a84389c
MD5 119334a2094622fd7259779b503c9186
BLAKE2b-256 f9b3f7d3e9f1557645c6196e9bb7056c848d9094ca3425298d0a47e69f8e1464

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page