lrtree: logistic regression trees
Project description
Logistic regression trees
Table of Contents
Motivation
The goal of lrtree
is to build decision trees with logistic regressions at their leaves, so that the resulting model mixes non parametric VS parametric and stepwise VS linear approaches to have the best predictive results, yet maintaining interpretability.
This is the implementation of glmtree as described in Formalization and study of statistical problems in Credit Scoring, Ehrhardt A. (see manuscript or web article)
Getting started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites
This code is supported on Python 3.8, 3.9, 3.10.
Installing the package
Installing the development version
If git
is installed on your machine, you can use:
pipenv install git+https://github.com/adimajo/lrtree.git
If git
is not installed, you can also use:
pipenv install --upgrade https://github.com/adimajo/lrtree/archive/master.tar.gz
Installing through the pip
command
You can install a stable version from PyPi by using:
pip install lrtree
To run the provided scripts, lrtree-consistency
and lrtree-realdata
, you need a
few additional dependencies:
pip install lrtree[scripts]
Installation guide for Anaconda
The installation with the pip
or pipenv
command should work. If not, please raise an issue.
For people behind proxy(ies)...
A lot of people, including myself, work behind a proxy at work...
A simple solution to get the package is to use the --proxy
option of pip
:
pip --proxy=http://username:password@server:port install lrtree
where username, password, server and port should be replaced by your own values.
If environment variables http_proxy
and / or https_proxy
and / or (unfortunately depending on applications...)
HTTP_PROXY
and HTTPS_PROXY
are set, the proxy settings should be picked up by pip
.
Over the years, I've found CNTLM to be a great tool in this regard.
Authors
- Adrien Ehrhardt
- Vincent Vandewalle
- Philippe Heinrich
- Christophe Biernacki
- Dmitri Gaynullin
- Elise Bayraktar
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
This research has been financed by Crédit Agricole Consumer Finance through a CIFRE PhD.
This research was supported by Inria Lille - Nord-Europe and Lille University as part of a PhD.
References
Ehrhardt, A. (2019), Formalization and study of statistical problems in Credit Scoring: Reject inference, discretization and pairwise interactions, logistic regression trees (PhD thesis).
Contribute
You can clone this project using:
git clone https://github.com/adimajo/lrtree.git
You can install all dependencies, including development dependencies, using (note that
this command requires pipenv
which can be installed by typing pip install pipenv
):
pipenv install -d
You can build the documentation by going into the docs
directory and typing make html
.
You can run the tests by typing coverage run -m pytest
, which relies on packages
coverage and pytest.
To run the tests in different environments (one for each version of Python), install pyenv
(see the instructions here),
install all versions you want to test (see tox.ini), e.g. with pyenv install 3.7.0
and run
pipenv run pyenv local 3.7.0 [...]
(and all other versions) followed by pipenv run tox
.
Python Environment
The project uses pipenv
. An interesting resource.
To download all the project dependencies in order to then port them to a machine that had limited access to the internet, you must use the command
pipenv lock -r > requirements.txt
which will transform the Pipfile
into a requirements.txt
.
Installation
To install a virtual environment as well as all the necessary dependencies, you must use the pipenv install
command for production use
or the command pipenv install -d
for development use.
Tests
The tests are based on pytest
and are stored in the tests
folder. They can all be launched with the command
pytest
in at the root of the project.
The test coverage can be calculated thanks to the coverage
package, which is also responsible for launching the tests.
The command to use is coverage run -m pytest
. We can then obtain a graphic summary in the form of an HTML page
using the coverage html
command which creates or updates the htmlcov
folder from which we can open the index.html
file.
Utilization
The package provides sklearn-like interface.
Loading sample data for regression task:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
The trained model consists of a fitted sklearn.tree.DecisionTreeClassifier
class for segmentation of a data and
sklearn.linear_model.LogisticRegression
regressions for each node a of a tree in a form of python list.
The snippet to train the model and make a prediction:
from lrtree import Lrtree
model = Lrtree(criterion="bic", ratios=(0.7,), class_num=2, max_iter=100)
# Fitting the model
model.fit(X_train, y_train)
# Make a prediction on a fitted model
model.predict(X_test)
If you installed the additional dependencies for scripts, you can also run directly from the command line:
LOGURU_LEVEL="ERROR" DEBUG="True" lrtree-consistency
or
LOGURU_LEVEL="ERROR" TQDM_DISABLE="1" lrtree-realdata
Beware: if you don't set LOGURU_LEVEL
then it is implicitly set on DEBUG
which will yield a lot of prints. Also, both scripts will take very long
to complete as they test the consistency of the method for various
hyperparameters and run cross-validation on 3 real datasets respectively.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lrtree-1.0.4.tar.gz
.
File metadata
- Download URL: lrtree-1.0.4.tar.gz
- Upload date:
- Size: 48.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39513bf5e9c83a2dbc8df68b41220525e5eb58bae839e91122063d376c1f7e3d |
|
MD5 | ad24207083feb9b3b084390ffba2ed24 |
|
BLAKE2b-256 | 9639b82a5dc69f2543070edc88a2a83c22963f4a9cd9ff47d572af612cfe8206 |
File details
Details for the file lrtree-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: lrtree-1.0.4-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a19bbb936125616e768d8880c02c8227748eb323dd2b64e589112509bff7793b |
|
MD5 | 4fcd34a696a2e096383c1d2097726a46 |
|
BLAKE2b-256 | b405c24660ba26e99b099d9afa0819bd3f5a40d0390e6570dd251c7889dcbb2f |