Skip to main content

Compute Mixed Membership Stochastic Block Models.

Project description

Mixed Membership Stochastic Block Models

Build Status

This repo follows this work on Mixed Membership Stochastic Block Models to build a recommender system [1].


pip install mmsbm


Input data

You'll need a pandas dataframe with exactly 3 columns: users, items and ratings, e.g.:

import pandas as pd
from random import choice

train = pd.DataFrame(
    "users": [f"user{choice(list(range(5)))}" for _ in range(100)],
    "items": [f"item{choice(list(range(10)))}" for _ in range(100)],
    "ratings": [choice(list(range(1, 6))) for _ in range(100)]

test = pd.DataFrame(
    "users": [f"user{choice(list(range(5)))}" for _ in range(50)],
    "items": [f"item{choice(list(range(10)))}" for _ in range(50)],
    "ratings": [choice(list(range(1, 6))) for _ in range(50)]


from mmsbm import MMSBM

# Initialize the MMSBM class:
mmsbm = MMSBM(

Fit models

In here you have two options, a simple fit where we run "sampling" times the fitting algorithm and return the results for all runs, you are then in charge of choosing the best one.

The other option is the cv_fit, whereby we split the input data in "folds" number of folds and run the fitting in each one and test on the excluded fold. We then return all the samplings of the best performing model.

mmsbm.cv_fit(train, folds=5)


Once the model is fitted, we can predict on test data. The function predict returns the prediction matrix (the probability of each user to belong to each group) as a numpy array.

pred_matrix = mmsbm.predict(test)


Finally, you can get statistics about the goodness of fit and other parameters of the model, as well as the computed objects: the theta matrix, the eta matrix and the probability distributions.

The function score returns a dictionary with two sub-dictionaries, one for statistics about the model (called "stats") and the other one with the computed objects (called "objects").

results = mmsbm.score()


Each iteration takes a little about half a second in an Intel i7. This means that a 500 iteration runs takes around 4 minutes. The computation is vectorized, so, as long as you don't go crazy with the number of observations, the time should be approximately the same regardless of training set size. It is also parallelized over sampling, so, as long as you choose less sampling than number of cores, you should have approximately the same performance regardless of training set size and sampling number.

A complete study could be something like 100 hyperparameter optimization runs of 6 samples of 400 iterations, which will take about 10 hours.


To run tests do the usual:

python -m pytest tests/*


  • Progress bars are not working for jupyter notebooks.
  • Include user_groups and item_groups optimization procedure.


Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.


[1]: Godoy-Lorite, Antonia, et al. "Accurate and scalable social recommendation using mixed-membership stochastic block models." Proceedings of the National Academy of Sciences 113.50 (2016): 14207-14212.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmsbm-0.1.3.tar.gz (11.6 kB view hashes)

Uploaded Source

Built Distribution

mmsbm-0.1.3-py3-none-any.whl (11.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page