Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser

Boolean matrix factorization on RNA expression data

# EM_BMF

Robust Boolean matrix factorization via EM_BMF The code is completely process-oriented. Sorry for contaminating your name space.

Dependency: (I think it will work as long as Annaconda on Python3 is installed)

numpy -- 1.11.3

scipy -- 1.1.0

numba -- 0.40.0

Example usage:

```import numpy as np
from boolem import boolem

def synthesis(shape, latent_size, P, noise_p=0.0):
'''
In this synthesis, the probability of X was sampled from the joint probability of the latent factors.
P is the parameter as Beta(1/(1-p),2) for generating the probability in latent factors.
'''

a = np.zeros((shape, latent_size))
b = np.zeros((latent_size, shape))
X = np.zeros(shape)
for l in range(latent_size):
a[:,l] = np.random.binomial(1, P[l], shape)
b[l,:] = np.random.binomial(1, P[l], shape)
X += np.outer(a[:,l],b[l,:])
X[X>1] = 1
flip = np.random.binomial(1, noise_p, X.shape)
X_noisy = np.abs(X-flip)
return X_noisy, X, a, b

# Generate a Boolean matrice with heterogeneous Boolean factors and uniform noise.
X_noisy, X, a, b = synthesis((1000, 1000), 4, np.random.uniform(0.2,0.5,4), noise_p=0.2)

# Feed the model with noisy matrix.
# Latent_size: the dimension of latent Boolean factors.
# alpha: the alpha for the beta prior. Default is recommended.
# beta: the beta for the beta prior. Default is recommended.
# mask: the matrix with the same shape as X. 0 means the correponding element in X is missing.
# max_iter: the maximum iteration for gradient-based optimization
model = boolem(np.int8(X_noisy), latent_size=5, alpha=0.95, beta=0.95, mask=np.ones(X.shape, dtype=np.int8), max_iter=200)
model.run()

# After running factorization, the model will contain several new attributes as the output:
# model.U: the latent factor with the shape (X.shape, latent_size)
# model.Z: the latent facotr with the shape (latent_size, X.shape)
# model.X_hat: reconstructed Boolean matrix from U and Z. Note that values in X_hat is continuous within [0,1]
print('Reconstruction error:', np.abs((model.X_hat>0.5)-X).mean())
```

## Release history Release notifications | RSS feed

This version 0.0.5 0.0.4 0.0.3 0.0.2

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for boolem, version 0.0.5
Filename, size File type Python version Upload date Hashes
Filename, size boolem-0.0.5-py3-none-any.whl (6.5 kB) File type Wheel Python version py3 Upload date Hashes
Filename, size boolem-0.0.5.tar.gz (4.0 kB) File type Source Python version None Upload date Hashes