Skip to main content

PCMF is a Python package of Positive Collective Matrix Factorization(PCMF). PCMF is a model that combines the interpretability of NMF and the extensibility of CMF.

Project description

Positive Collective Matrix Factorization (PCMF)

We propose Positive Collective Matrix Factorization (PCMF). PCMF is a model that combines the interpretability of NMF and the extensibility of CMF.

Description of PCMF

Problem setting

When there are two relational data (matrix , ) that share one set, and you want to predict the relational data (matrix , ) and extract feature representations (matrix , , ) at the same time.

Example

  • Two relational data (matrix , )

: Patient-disease matrix
: Patient-patient attribute matrix

At this time, the patient set is shared.

  • Feature representations

: Patient matrix
: Disease matrix
: Patient attributes matrix

Detailed description of PCMF

PCMF is a model that combines the advantages of NMF, "interpretability," and the advantages of CMF, "extensibility." Specifically, for each matrix, interpretability is achieved by converting the elements of the matrix into positive values using a softplus function. The backpropagation method is used as the learning method.

The illustration of PCMF is as follows.

Example

This will be described using the previous example.

  • The patient matrix with the softplus function applied is the patient matrix .
  • The disease matrix with the softplus function applied is the disease matrix .
  • The patient attribute matrix with the softplus function applied is the patient attribute matrix .
  • Applying the link function to the product of the patient matrix and the disease matrix yields the predicted value of the patient-disease matrix .
  • Applying the link function to the product of the patient matrix and the patient attributes yields the predicted value of the patient-patient attributes matrix .

Softplus function

The softplus function is a narrowly monotonically increasing function that takes a positive value for all real numbers . It is applied to each element of the matrix, and it is assumed that a matrix of the same size is output.

Link function

Note that due to the influence of the Softplus function, the input value of the PCMF link function is always positive. Choose a link function depending on the nature and purpose of the matrix you are predicting.

  • When the value of the matrix to be predicted is (-∞, ∞)
    Log function.

  • When the value of the matrix to be predicted is (0, ∞)
    Linear function.

  • When the value of the matrix to be predicted is {0,1}
    Sigmoid function. (Since the output value of the sigmoid function is 0.5 or more when the input value is 0 or more, the operation of subtracting a common positive number uniformly for the input is performed.)

Feature representations analysis

Feature representations analysis can be performed by analyzing the feature representations (matrix , , ) extracted by PCMF. (Note that PCMF outputs the matrix , , ), which is the format to which the softplus function is applied, as the final output.)

CMF and NMF (reference)

Non-Negative Matrix Factorization (NMF) and Collective matrix Factorization (CMF) exist as methods of matrix factorization. The features of each are as follows.

Non-Negative Matrix Factorization(NMF)[1][2]

Predict the original matrix by the product of two nonnegative matrices.

  • Advantages
    Since it is non-negative, a highly interpretable feature representation can be obtained.

  • Disadvantages
    Low extensibility because multiple relationships cannot be considered.

Collective matrix Factorization(CMF)[3]

This is a method of factoring two or more relational data (matrix) at the same time when a set has multiple relations.

  • Advantages
    In addition to being able to consider multiple relationships, flexible output is possible (link function), so it is highly extensible.

  • Disadvantages
    The interpretability is low because positive and negative values appear in the elements of the matrix.

Installation

You can get PCMF from PyPI. Our project in PyPI is here.

pip install pcmf

Usage

For more detail, please read examples/How_to_use_PCMF.ipynb. If it doesn't render at all in github, please click here.

Training

cmf = Positive_Collective_Matrix_Factorization(X, Y, alpha=0.5, d_hidden=12, lamda=0.1)
cmf.train(link_X = 'sigmoid', link_Y = 'sigmoid', 
          weight_X = None, weight_Y =wY, 
          optim_steps=501, verbose=50, lr=0.05)

License

MIT Licence

Citation

You may use our package(PCMF) under MIT License. If you use this program in your research then please cite:

PCMF Package

@misc{sumiya2021pcmf,
  author = {Yuki, Sumiya and Ryo, Matsui and Kensho, Kondo and Kazuhide, Nakata},
  title = {PCMF},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {https://github.com/N-YS-KK/PCMF}
}

PCMF Paper[ link ](Japanese)

@article{sumiya2021pcmf,
  title={Patient Disease Prediction and Medical Feature Extraction using Matrix Factorization},
  author={Yuki, Sumiya and Atsuyoshi, Matsuda and Kenji, Araki and Kazuhide, Nakata},
  journal={The Japanese Society for Artifical Intelligence},
  year={2021}
}

Reference

[5] [6] [7] are used in the code.

[1] Daniel D. Lee and H. Sebastian Seung. “Learning the parts of objects by non-negative matrix factorization.” Nature 401.6755 (1999): 788-791.

[2] Daniel D. Lee and H. Sebastian Seung. “Algorithms for non-negative matrix factorization.” Advances in neural information processing systems 13 (2001): 556-562.

[3] Ajit P. Singh and Geoffrey J. Gordon. Relational learning via collective matrix factorization. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 650-658, 2008.

[4] Yuki Sumiya, Kazuhide Nakata, Atsuyoshi Matsuda, Kenji Araki. "Patient Disease Prediction and Relational Data Mining using Matrix Factorization." The 40th Joint Conference on Medical Informatics, 2020.

[5] David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams. “Learning representations by back-propagating errors.” Nature 323.6088 (1986): 533-536

[6] Diederik P. Kingma and Jimmy Ba. “Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980 (2014).

[7] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfel-low, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu and Xiaoqiang Zheng. “Tensor-flow: Large-scale machine learning on heterogeneous distributed systems.” arXiv preprint arXiv:1603.04467 (2016)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PCMF-0.1.5.tar.gz (8.2 kB view hashes)

Uploaded source

Built Distribution

PCMF-0.1.5-py3-none-any.whl (7.3 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page