Skip to main content

lrtree: logistic regression trees

Project description

PyPI version PyPI pyversions PyPi Downloads Build Status Python package codecov

Logistic regression trees

Table of Contents

Motivation

The goal of lrtree is to build decision trees with logistic regressions at their leaves, so that the resulting model mixes non parametric VS parametric and stepwise VS linear approaches to have the best predictive results, yet maintaining interpretability.

This is the implementation of glmtree as described in Formalization and study of statistical problems in Credit Scoring, Ehrhardt A. (see manuscript or web article)

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

This code is supported on Python 3.8, 3.9, 3.10.

Installing the package

Installing the development version

If git is installed on your machine, you can use:

pipenv install git+https://github.com/adimajo/lrtree.git

If git is not installed, you can also use:

pipenv install --upgrade https://github.com/adimajo/lrtree/archive/master.tar.gz

Installing through the pip command

You can install a stable version from PyPi by using:

pip install lrtree

To run the provided scripts, lrtree-consistency and lrtree-realdata, you need a few additional dependencies:

pip install lrtree[scripts]

Installation guide for Anaconda

The installation with the pip or pipenv command should work. If not, please raise an issue.

For people behind proxy(ies)...

A lot of people, including myself, work behind a proxy at work...

A simple solution to get the package is to use the --proxy option of pip:

pip --proxy=http://username:password@server:port install lrtree

where username, password, server and port should be replaced by your own values.

If environment variables http_proxy and / or https_proxy and / or (unfortunately depending on applications...) HTTP_PROXY and HTTPS_PROXY are set, the proxy settings should be picked up by pip.

Over the years, I've found CNTLM to be a great tool in this regard.

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This research has been financed by Crédit Agricole Consumer Finance through a CIFRE PhD.

This research was supported by Inria Lille - Nord-Europe and Lille University as part of a PhD.

References

Ehrhardt, A. (2019), Formalization and study of statistical problems in Credit Scoring: Reject inference, discretization and pairwise interactions, logistic regression trees (PhD thesis).

Contribute

You can clone this project using:

git clone https://github.com/adimajo/lrtree.git

You can install all dependencies, including development dependencies, using (note that this command requires pipenv which can be installed by typing pip install pipenv):

pipenv install -d

You can build the documentation by going into the docs directory and typing make html.

You can run the tests by typing coverage run -m pytest, which relies on packages coverage and pytest.

To run the tests in different environments (one for each version of Python), install pyenv (see the instructions here), install all versions you want to test (see tox.ini), e.g. with pyenv install 3.7.0 and run pipenv run pyenv local 3.7.0 [...] (and all other versions) followed by pipenv run tox.

Python Environment

The project uses pipenv. An interesting resource.

To download all the project dependencies in order to then port them to a machine that had limited access to the internet, you must use the command pipenv lock -r > requirements.txt which will transform the Pipfile into a requirements.txt.

Installation

To install a virtual environment as well as all the necessary dependencies, you must use the pipenv install command for production use or the command pipenv install -d for development use.

Tests

The tests are based on pytest and are stored in the tests folder. They can all be launched with the command pytest in at the root of the project. The test coverage can be calculated thanks to the coverage package, which is also responsible for launching the tests. The command to use is coverage run -m pytest. We can then obtain a graphic summary in the form of an HTML page using the coverage html command which creates or updates the htmlcov folder from which we can open the index.html file.

Utilization

The package provides sklearn-like interface.

Loading sample data for regression task:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

The trained model consists of a fitted sklearn.tree.DecisionTreeClassifier class for segmentation of a data and sklearn.linear_model.LogisticRegression regressions for each node a of a tree in a form of python list.

The snippet to train the model and make a prediction:

from lrtree import Lrtree

model = Lrtree(criterion="bic", ratios=(0.7,), class_num=2, max_iter=100)

# Fitting the model
model.fit(X_train, y_train)

# Make a prediction on a fitted model
model.predict(X_test)

If you installed the additional dependencies for scripts, you can also run directly from the command line:

LOGURU_LEVEL="ERROR" DEBUG="True" lrtree-consistency

or

LOGURU_LEVEL="ERROR" TQDM_DISABLE="1" lrtree-realdata

Beware: if you don't set LOGURU_LEVEL then it is implicitly set on DEBUG which will yield a lot of prints. Also, both scripts will take very long to complete as they test the consistency of the method for various hyperparameters and run cross-validation on 3 real datasets respectively.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lrtree-1.0.3.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

lrtree-1.0.3-py3-none-any.whl (45.1 kB view details)

Uploaded Python 3

File details

Details for the file lrtree-1.0.3.tar.gz.

File metadata

  • Download URL: lrtree-1.0.3.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for lrtree-1.0.3.tar.gz
Algorithm Hash digest
SHA256 675679a65818d83ea70270b721a798a1a8b95b7a8fb40482f82432cdd18e27fb
MD5 58a61e9bbd63d6513dcb1b3cae7a032c
BLAKE2b-256 9319bda54354a60736c85c1954bd0df4f38ae4ce3d66158e66e98b6ed09507ac

See more details on using hashes here.

File details

Details for the file lrtree-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: lrtree-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 45.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for lrtree-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 228767b31fdd60303564567b2cf94f61a729a2be81cca15fff075bf1ec900ea6
MD5 0f718d67cdf65717afd0189735c307c7
BLAKE2b-256 1922a04db4f6f7ca3f01f5210891db2d920e8b587cce2a2528493d40dae6ea7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page