Skip to main content

abess Python Package

Project description

logopic

Python build status R build status codecov Documentation Status cran pypi pyversions License Codacy CodeFactor

Overview

abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection, i.e., find a small subset of predictors such that the resulting model is expected to have the highest accuracy. The selection for best subset shows great value in scientific researches and practical application. For example, clinicians wants to know whether a patient is health or not based on the expression level of a few of important genes.

This library implements a generic algorithm framework to find the optimal solution in an extremely fast way [1]. This framework now supports the detection of best subset under: linear regression, (multi-class) classification, censored-response modeling [2], multi-response modeling (a.k.a. multi-tasks learning), etc. It also supports the variants of best subset selection like group best subset selection [3] and nuisance best subset selection [4]. Especially, the time complexity of (group) best subset selection for linear regression is certifiably polynomial [1] [3].

Quick start

Install the stable abess Python package from Pypi:

$ pip install abess

Best subset selection for linear regression on a simulated dataset in Python:

from abess.linear import abessLm
from abess.datasets import make_glm_data
sim_dat = make_glm_data(n = 300, p = 1000, k = 10, family = "gaussian")
model = abessLm()
model.fit(sim_dat.x, sim_dat.y)

See more examples analyzed with Python in the tutorials; the notebooks are available here.

Runtime Performance

To show the power of abess in computation, we assess its timings of the CPU execution (seconds) on synthetic datasets, and compare to state-of-the-art variable selection methods. The variable selection and estimation results are deferred to performance.

We compare abess Python package with scikit-learn on linear and logistic regression. Results are presented in the below figure, and can be reproduce by running the commands in shell:

$ python ./simulation/Python/timings.py

we obtain the runtime comparison picture:

pic1

abess reaches a high efficient performance especially in linear regression where it gives the fastest solution.

Open source software

abess is a free software and its source code are publicly available in Github. The core framework is programmed in C++, and user-friendly R and Python interfaces are offered. You can redistribute it and/or modify it under the terms of the GPL-v3 License. We welcome contributions for abess, especially stretching abess to the other best subset selection problems.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abess-0.3.4.tar.gz (1.6 MB view hashes)

Uploaded Source

Built Distributions

abess-0.3.4-cp39-cp39-win_amd64.whl (435.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

abess-0.3.4-cp38-cp38-win_amd64.whl (435.1 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

abess-0.3.4-cp37-cp37m-win_amd64.whl (434.9 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

abess-0.3.4-cp36-cp36m-win_amd64.whl (434.9 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

abess-0.3.4-cp35-cp35m-win_amd64.whl (434.8 kB view hashes)

Uploaded CPython 3.5m Windows x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page