Bayesian Linear Programming (BLiP) in Python
Project description
pyblip: A python implementation of BLiP
Introduction
In many applications, we can tell that a signal of interest exists but cannot perfectly "localize" it. For example, when regressing an outcome Y on highly correlated covariates (X1, X2), the data may suggest that at least one of (X1, X2) influences Y, but it may be challenging to tell which of (X1, X2) is important. Likewise, in genetic fine-mapping, biologists may have high confidence that a gene influences a disease without knowing precisely which genetic variants cause the disease. Similar problems arise in many settings with spatial or temporal structure, including change-point detection and astronomical point-source detection.
Bayesian Linear Programming (BLiP) is a method which jointly detects as many signals as possible while localizing them as precisely as possible. BLiP can wrap on top of nearly any Bayesian model or algorithm, and it will return a set of regions which each contain at least one signal with high confidence. For example, in regression problems, BLiP might return the region (X1, X2), which suggests that at least one of (X1, X2) is an important variable. BLiP controls the false discovery rate while also making these regions as narrow as possible, meaning that (roughly speaking) it will perfectly localize signals whenever this is possible!
pyblip
is an efficient python implementation of BLiP, which is designed so that BLiP can wrap on top of the Bayesian model in only one or two lines of code. For convenience, it also includes fast Bayesian samplers for linear regression, probit regresion, and change point detection, since BLiP can wrap on top of any of these methods.
Installation
You can install pyblip
using pip:
``pip install pyblip``
Performance improvement
BLiP uses cvxpy to efficiently solve linear programs. pyblip
will install and run correctly using the default cvxpy
solvers, but to improve performance, we also recommend installing the Cbc solver. See this link for system-specific instructions to install Cbc.
Installation issues
pip
should automatically install pyblip
's dependencies. However, if installation fails, it is likely for one of three reasons:
-
You are having trouble installing
cvxpy
By default, installing
cvxpy
also installs several heavy-duty convex solvers, and installation of these solvers can fail. However,pyblip
only requires a few solvers. As a result, it can be easier to install a lightweight version ofcvxpy
with the following command:pip install cvxpy-base
and then install theSCS
orCBC
solvers. Please see the cvxpy installation instructions for more details. -
You are having trouble installing
cython
.The Bayesian samplers in
pyblip
are written in cython to improve performance. If your system is having trouble installing cython, see the cython website for instructions. -
You are having trouble installing
cvxopt
.pyblip
requirescvxopt
because installingcvxopt
is often the easiest way to get access to a solver for mixed-integer linear programs. If you are having trouble installingcvxopt
, you can avoid this problem by specifyingdeterministic=False
whenever running BLiP, and you will not need a mixed-integer solver. Alternatively, you can follow the instructions on the cvxpy website to install any other mixed-integer solver. For example, if you installCBC
as described earlier, you do not needcvxopt
.
Quick start
Here, we apply BLiP to perform variable selection in a sparse linear regression problem. The first step is to generate synthetic data and fit the Bayesian model.
# Synthetic regression data with AR1 design matrix
import numpy as np
import scipy.linalg
n, p, nsignals, rho = 100, 500, 20, 0.95
c = np.cumsum(np.zeros(p) + np.log(rho)) - np.log(rho)
cov = scipy.linalg.toeplitz(np.exp(c))
X = np.dot(np.random.randn(n, p), np.linalg.cholesky(cov).T)
# Sparse coefficients for linear model
beta = np.zeros(p)
signal_locations = np.random.choice(np.arange(p), nsignals)
beta[signal_locations] = np.random.randn(nsignals)
y = np.dot(X, beta) + np.random.randn(n)
# Spike-and-slab bayesian sampling
import pyblip
lm = pyblip.linear.LinearSpikeSlab(
X=X, y=y,
)
lm.sample(N=1000, chains=10)
The second step is to apply BLiP directly on top of the posterior samples of the linear coefficients:
detections = pyblip.blip.BLiP(
samples=lm.betas,
q=0.1,
error='fdr'
)
for x in detections:
print("BLiP has detected a signal among {x.group}!")
Please see amspector100.github.io/pyblip/usage.html for examples ranging from variable selection to change-point detection to astronomical point-source detection.
Documentation
Documentation is available at amspector100.github.io/pyblip.
Reference
If you use pyblip
or BLiP in an academic publication, please consider citing Spector and Janson (2022). The bibtex entry is below:
@article{AS-LJ:2022,
title={Controlled Discovery and Localization of Signals via Bayesian Linear Programming},
author={Spector, Asher and Janson, Lucas},
journal={arXiv preprint arXiv:2203.17208},
url={https://arxiv.org/abs/2203.17208},
year={2022}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyblip-0.4.2.tar.gz
.
File metadata
- Download URL: pyblip-0.4.2.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24124899d8358c7d99154f611e4163afbf8370303bd7996d9f57852a566c901e |
|
MD5 | 7001985a8e7bbdab5f8f8e0bc1f18162 |
|
BLAKE2b-256 | 3949faefd7fc26a3c7b4002ee45a2dfb4f673d0d6c22d1ba113395d1fe64c769 |