Boolean matrix factorization on RNA expression data
Project description
EM_BMF
Robust Boolean matrix factorization via EM_BMF The code is completely process-oriented. Sorry for contaminating your name space.
Dependency: (I think it will work as long as Annaconda on Python3 is installed)
numpy -- 1.11.3
scipy -- 1.1.0
numba -- 0.40.0
Example usage:
import numpy as np from boolem import boolem def synthesis(shape, latent_size, P, noise_p=0.0): ''' In this synthesis, the probability of X was sampled from the joint probability of the latent factors. P is the parameter as Beta(1/(1-p),2) for generating the probability in latent factors. ''' a = np.zeros((shape[0], latent_size)) b = np.zeros((latent_size, shape[1])) X = np.zeros(shape) for l in range(latent_size): a[:,l] = np.random.binomial(1, P[l], shape[0]) b[l,:] = np.random.binomial(1, P[l], shape[1]) X += np.outer(a[:,l],b[l,:]) X[X>1] = 1 flip = np.random.binomial(1, noise_p, X.shape) X_noisy = np.abs(X-flip) return X_noisy, X, a, b # Generate a Boolean matrice with heterogeneous Boolean factors and uniform noise. X_noisy, X, a, b = synthesis((1000, 1000), 4, np.random.uniform(0.2,0.5,4), noise_p=0.2) # Feed the model with noisy matrix. # Latent_size: the dimension of latent Boolean factors. # alpha: the alpha for the beta prior. Default is recommended. # beta: the beta for the beta prior. Default is recommended. # mask: the matrix with the same shape as X. 0 means the correponding element in X is missing. # max_iter: the maximum iteration for gradient-based optimization model = boolem(np.int8(X_noisy), latent_size=5, alpha=0.95, beta=0.95, mask=np.ones(X.shape, dtype=np.int8), max_iter=200) model.run() # After running factorization, the model will contain several new attributes as the output: # model.U: the latent factor with the shape (X.shape[0], latent_size) # model.Z: the latent facotr with the shape (latent_size, X.shape[1]) # model.X_hat: reconstructed Boolean matrix from U and Z. Note that values in X_hat is continuous within [0,1] print('Reconstruction error:', np.abs((model.X_hat>0.5)-X).mean())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
boolem-0.0.5.tar.gz
(4.0 kB
view hashes)
Built Distribution
boolem-0.0.5-py3-none-any.whl
(6.5 kB
view hashes)