bess Python Package

These details have not been verified by PyPI

Project links

Homepage

Project description

bess: A Python Package for Best Subset Selection

Introduction

One of the main tasks of statistical modeling is to exploit the association between a response variable and multiple predictors. Linear model (LM), as a simple parametric regression model, is often used to capture linear dependence between response and predictors. Generalized linear model (GLM) can be considered as the extensions of linear model, depending on the types of responses. Parameter estimation in these models can be computationally intensive when the number of predictors is large. Meanwhile, Occam's razor is widely accepted as a heuristic rule for statistical modeling, which balances goodness of fit and model complexity. This rule leads to a relative small subset of important predictors.

bess package provides solutions for best subset selection problem for sparse LM, and GLM models.

We consider a primal-dual active set (PDAS) approach to exactly solve the best subset selection problem for sparse LM and GLM models. The PDAS algorithm for linear least squares problems was first introduced by Ito and Kunisch (2013) and later discussed by Jiao, Jin, and Lu (2015) and Huang, Jiao, Liu, and Lu (2017). It utilizes an active set updating strategy and fits the sub-models through use of complementary primal and dual variables. We generalize the PDAS algorithm for general convex loss functions with the best subset constraint, and further extend it to support both sequential and golden section search strategies for optimal k determination.

Install

Python Version

python >= 3.5

Modules needed

numpy

The package has been publish in PyPI. You can easy install by:

$ pip install bess

Example

### PdasLm sample
from bess.linear import *
import numpy as np

np.random.seed(12345)   # fix seed to get the same result
x = np.random.normal(0, 1, 100 * 150).reshape((100, 150))
beta = np.hstack((np.array([1, 1, -1, -1, -1]), np.zeros(145)))
noise = np.random.normal(0, 1, 100)
y = np.matmul(x, beta) + noise

### Sparsity known
model = PdasLm(path_type="seq", sequence=[5])
model.fit(X=x, y=y)
model.predict(x)

### Sparsity unknown
# path_type="seq", Default:sequence=[1,2,...,min(x.shape[0], x.shape[1])]
model = PdasLm(path_type="seq", sequence=range(1,10))
model.fit(X=x, y=y)
model.predict(x)

# path_type="pgs", Default:s_min=1, s_max=X.shape[1], K_max = int(math.log(p, 2/(math.sqrt(5) - 1)))
model = PdasLm(path_type="pgs", s_max=20)
model.fit(X=x, y=y)
model.predict(x)


### PdasLogistic sample
np.random.seed(12345)
x = np.random.normal(0, 1, 100 * 150).reshape((100, 150))
beta = np.hstack((np.array([1, 1, -1, -1, -1]), np.zeros(145)))
xbeta = np.matmul(x, beta)
p = np.exp(xbeta)/(1+np.exp(xbeta))
y = np.random.binomial(1, p)

### Sparsity known
model = PdasLogistic(path_type="seq", sequence=[5])
model.fit(X=x, y=y)
model.predict(x)

### Sparsity unknown
# path_type="seq", Default:sequence=[1,2,...,min(x.shape[0], x.shape[1])]
model = PdasLogistic(path_type="seq", sequence=range(1,10))
model.fit(X=x, y=y)
model.predict(x)

# path_type="pgs", Default:s_min=1, s_max=X.shape[1], K_max = int(math.log(p, 2/(math.sqrt(5) - 1)))
model = PdasLogistic(path_type="pgs")
model.fit(X=x, y=y)
model.predict(x)


### PdasPoisson sample
np.random.seed(12345)
x = np.random.normal(0, 1, 100 * 150).reshape((100, 150))
beta = np.hstack((np.array([1, 1, -1, -1, -1]), np.zeros(145)))
lam = np.exp(np.matmul(x, beta))
y = np.random.poisson(lam=lam)

### Sparsity known
model = PdasPoisson(path_type="seq", sequence=[5])
model.fit(X=x, y=y)
model.predict(x)

### Sparsity unknown
# path_type="seq", Default:sequence=[1,2,...,min(x.shape[0], x.shape[1])]
model = PdasPoisson(path_type="seq", sequence=range(1,10))
model.fit(X=x, y=y)
model.predict(x)

# path_type="pgs", Default:s_min=1, s_max=X.shape[1], K_max = int(math.log(p, 2/(math.sqrt(5) - 1)))
model = PdasPoisson(path_type="pgs")
model.fit(X=x, y=y)
model.predict(x)


### PdasCox sample
from bess.gen_data import gen_data
np.random.seed(12345)
data = gen_data(100, 200, family="cox", k=5, rho=0, sigma=1, c=10)

### Sparsity known
model = PdasCox(path_type="seq", sequence=[5])
model.fit(data.x, data.y, is_normal=True)
model.predict(data.x)

### Sparsity unknown
# path_type="seq", Default:sequence=[1,2,...,min(x.shape[0], x.shape[1])]
model = PdasCox(path_type="seq", sequence=range(1,10))
model.fit(data.x, data.y)
model.predict(data.x)

# path_type="pgs", Default:s_min=1, s_max=X.shape[1], K_max = int(math.log(p, 2/(math.sqrt(5) - 1)))
model = PdasCox(path_type="pgs")
model.fit(data.x, data.y)
model.predict(data.x)

Reference

Wen, C. , Zhang, A. , Quan, S. , & Wang, X. . (2017). Bess: an r package for best subset selection in linear, logistic and coxph models

Bug report

Connect to @Jiang-Kangkang, or send an email to Jiang Kangkang(jiangkk3@mail2.sysu.edu.cn)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.13

Mar 16, 2021

0.0.12

Jun 13, 2020

0.0.11

Jun 2, 2020

0.0.10

Apr 19, 2020

0.0.9

Apr 17, 2020

0.0.8

Mar 23, 2020

0.0.7

Mar 19, 2020

0.0.6

Mar 19, 2020

0.0.5

Mar 17, 2020

0.0.4

Mar 16, 2020

0.0.3

Mar 16, 2020

0.0.2

Mar 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bess-0.0.13.tar.gz (1.4 MB view details)

Uploaded Mar 16, 2021 Source

File details

Details for the file bess-0.0.13.tar.gz.

File metadata

Download URL: bess-0.0.13.tar.gz
Upload date: Mar 16, 2021
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/19.2 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.4

File hashes

Hashes for bess-0.0.13.tar.gz
Algorithm	Hash digest
SHA256	`11e4b413ec1a925311e469872ee5a491643f7aabf7ac132c54878bb28c78d9c2`
MD5	`0dbfb705e3dd644393d8f9821c0fc9b3`
BLAKE2b-256	`89cf2b88045d7a5a211159d8370132a9375787b266035920883a8ea555312a5e`

See more details on using hashes here.

bess 0.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bess: A Python Package for Best Subset Selection

Introduction

Install

Example

Reference

Bug report

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes