A gaussian deconvolution package
Project description
scBayesDeconv
Package which allow the deconvolution of two added random variables using bayesian mixture approaches.
Z = X + Y
where X we call it the autofluorescence,Y the deconvolution andZ the convolution; which are random variables. If we have a sample of values from distribution X and Z, the package tryes to deconvolve the signal to obtain a distribution of Y. The kinds of bayesian mixtures are implemented:
- Gaussian
- Gamma
Installation
The package can be installed from the PyPi repository with the command:
pip install scBayesDeconv
Problems with nstallation from PyPi
In case of problems, you can always compile the package from the git repository. The requirements for installation are:
- CMake
- A C++ compiler and at least c++11 standard (g++, Visual Studio, Clang...)
- The scikit-build library for python (if not,
pip install scikit-build)
In the gaussian deconvolution folder, create the binary installation.
python setup.py bdist_wheel
This will generate a wheel in automatically created dist folder. Now, we can install it.
pip install ./dist/*
If everything is okey, you should be happily running the code after a few seconds of compilation ;)
Small tutorial
The package behaves very similar to the scikit-learn package.
Consider that we have two arrays of data, one with some noise dataNoise and the second with the convolved data dataConvolved.
Import the package
import scBayesDeconv as gd
Declare one of the two models. The models consider by default one gaussian for the noise and one gaussian for the convolved data. Consider that we want to fit the noise to one and the convolved data with three.
model = gd.mcmcsampler(K=1, Kc=3)
or
model = gd.nestedsampler(K=1, Kc=3)
Once declared, fit the model:
model.fit(dataNoise,dataConvolved)
With the model fit, we can sample from the model
model.sample_autofluorescence(size=100)
model.sample_deconvolution(size=100)
model.sample_convolution(size=100)
or evaluate at certain positions. This will return the mean value, as well as any specified percentiles (by default at 0.05 and 0.95).
x = np.arange(0,1,0.1)
model.score_autofluorescence(x, percentiles=[0.05,0.5,0.95])
model.score_deconvolution(x, percentiles=[0.05,0.5,0.95])
model.score_convolution(x, percentiles=[0.05,0.5,0.95])
In addition, for the mcmcsampler, it is possible to obtain some resume statistics of the sampler in order to check if the sampling process has converged to the posterior.
model.statistics()
An rhat close to 1 indicates that the posterior chains have mixed appropiately. neff is an indicator of the effective number of independent samples drawn from the model. For more information, have a look to the Stan package and its associated bayesian statistics book, Chaper 11.
Which model should I use?
Both models correspond to the same posterior likelihood with the only difference on how samples from this posterior are drawn.
The mcmc sampler is based in Gibbs and MCMC markov chain steps with help of indicator variables. This are extensively explained in the book of Gelman. Such sampler have the benefit of converging fast to a mode of the posterior and have the nice property of concentrating around solutions with sparse number of components. Howerver, the posterior distribution of the bayesian deconvolution model is multimodal and, for big noises, can lead to high degeneracies of the system. In such cases, samplers based in markov chains have severe dificulties to converge. Fro the above reasons, this sampler should be used mainly for exploratory purposes in order to have a general idea of the deconvolution as well as the number of components required to describe appropiately the posterior.
The nested sampler is based in the ideas of nested sampling introduced by Skilling. Such sampling methods have more power in order to explore complex distributions with multimodalities and complex degeneracies. The counterpart is that the sampler does not select component sparse regions of the space and the exploration becomes fast computationally expensive with the number of components. In order to speed the computation, we wrapped the well documented and recently published library for dynamic nested sampling Dynesty around C++ in order to obtain reasonable sampling times for samples of data of the order of magnitude tipically encountered flow citometry datasets. The posteriors obtained through this sampling method capture better the complexity of the gaussian deconvolution in a non-prohibitive amount of time, in contrast to the mcmc sampler.
Overall, one should use the mcmc sampler for exploration and number of components selection and feed that information to a selected nested sampler model in order to obtain the most reliable results within reasonable computational time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scBayesDeconv-0.1.tar.gz.
File metadata
- Download URL: scBayesDeconv-0.1.tar.gz
- Upload date:
- Size: 47.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f50c9515202f4ae495bc2b73df0a9f8b91734e9e91357a97fe505f3af4d0a9d5
|
|
| MD5 |
e92fb5157855f4d5fdb689ccd9a22bd6
|
|
| BLAKE2b-256 |
b1fc4cecf0990a00ef148f40934628b686d97d92e09a568aa5696d2e982bb254
|