Skip to main content

A Python library for Boolean Matrix Factorization

Project description

PyBMF

Documentation Status

A Python library for Boolean Matrix Factorization. Work under Preferred.ai.

PyBMF is under active development. We welcome the authors of BMF papers and those interested in BMF to play around and contribute. Please contact us if you have any questions or suggestions.

Prospectives

Boolean matrix factorization (BMF) is a well-known problem in pattern mining. Throughout the years of prosperous research, it has evolved from greedy heuristics to include a wide range of advanced technologies. We hold the belief that a playground with fairness and adaptiveness is necessary for the development of such algorithms.

PyBMF aims to provide a unified framework with:

  1. generators for various types of synthetic data
  2. unified ways of importing real-world data
  3. data splitting and cross-validation utilities
  4. negative sampling utilities for continuous methods
  5. the ability to utilize sparse matrices for heuristics
  6. evaluation tools for binary and continuous metrics
  7. visualization tools for single or multi-matrix data
  8. tools for saving and loading models and logs
  9. ability to incorporate Boolean matrix simplification and visualization models

Models

Category Model Paper Original Implementation In PyBMF
Heuristics Asso PKDD2006 TKDE2008 C
Heuristics Hyper/Hyper+ SIGKDD2011
Heuristics GreConD JCSS2010 MATLAB
Heuristics Panda ICDM2010
Heuristics Panda+ TKDE2013
Heuristics NASSAU SDM2015 link
Heuristics GreConD+ DAM2018 MATLAB
Heuristics MEBF AAAI2020 R
Continuous NMFSklearn 🛞 Wrapper of sklearn.nmf
Continuous WNMF ✅ Multiplicative update
Continuous BinaryMF-Penalty ICDM2007 MATLAB ✅ Multiplicative update
Continuous BinaryMF-Thresholding ICDM2007 MATLAB ✅ Line search
Continuous FastStep PAKDD2016 C++ ✅ Line search
Continuous PRIMP DMKD2017 CUDA C++ ✅ PALM
Continuous PNL-PF SP2021 ✅ Multiplicative update
Continuous ELBMF NIPS2022 Julia Python ✅ PALM
Probablistic MessagePassing ICML2016 Python 🛞 Wrapper of original implementation
Probablistic OrMachine ICLM2017 Cython 🛞 Wrapper of original implementation
Linear Optimization ColumnGeneration AAAI2021 Python 🛞 Wrapper of original implementation
Satisfiability UndercoverBMF AAAI2021 C++ 🛞 Wrapper of original implementation
Simplification IterEss IS2019
Simplification DelegationBMF AAAI2024 C++
Visualization OrderedBMF SIAM2019 C++
Visualization BiclusterVisualization PKDD2023 Python

Compatibility

Currently built and tested on Python 3.9.18.

TODO

  • Add mask parameter W to PRIMP and ELBMF
  • Fix DataFrame display utils in dataframe_utils.py
  • Include BMF visualization models
  • Diagnosis of thresholding models

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page