Skip to main content

A package for doing great things!

Project description

SLSVD

Sparse Logistic Singular Value Decomposition (SLSVD) for Binary Matrix Data

CI/CD codecov Documentation Status License: MIT version Python 3.9.0 release Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Project Summary

We implement the Sparse Logistic Singular Value Decomposition (SLSVD) using the Majorization-Minimization (MM) and coordinate descent (CD) algorithms in this Python package.

Our package consists of three major components:

  1. Simulated binary data generation
  2. Sparse logistic SVD
  3. Metrics for evaluating estimations

Functions

There are two major functions in this package:

generate_data(n, d, rank, random_seed=123): This function generates random binary data points. It takes four parameters: n for the number of data points, d for the number of features, rank for the number of rank, and random_seed for ensuring reproducibility.

sparse_logistic_svd_coord(dat, lambdas=np.logspace(-2, 2, num=10), k=2, quiet=True, max_iters=100, conv_crit=1e-5, randstart=False, normalize=False, start_A=None, start_B=None, start_mu=None): This function performs Sparse Logistic Singular Value Decomposition (SLSVD) using Majorization-Minimization and Coordinate Descent algorithms.

Common Parameters

  • n (integer): Number of data points.
  • d (integer): Number of features.
  • rank: Number of components.
  • random_seed (integer): Random seed to ensure reproducibility.
  • dat: Input data matrix.
  • lambdas: Array of regularization parameters.
  • k: Number of components.
  • quiet: Boolean to suppress iteration printouts.
  • max_iters: Maximum number of iterations.
  • conv_crit: Convergence criterion.
  • randstart: Boolean to use random initialization.
  • normalize: Boolean to normalize the components.
  • start_A: Initial value for matrix A.
  • start_B: Initial value for matrix B.
  • start_mu: Initial value for the mean vector.

Python Ecosystem Context

SLSVD establishes itself as a valuable enhancement to the Python ecosystem. There is no function in the Python package scikit-learn has similar functionality, our implementation uses Majorization-Minimization and Coordinate Descent algorithms.

Installation

Prerequisites

Make sure Miniconda or Anaconda is installed on your system

Step 1: Clone the Repository

git clone git@github.com:andyzhangstat/SLSVD.git
cd SLSVD  # Navigate to the cloned repository directory

Step 2: Create and Activate the Conda Environment

# Method 1: create Conda Environment from the environment.yml file
conda env create -f environment.yml  # Create Conda environment
conda activate SLSVD  # Activate the Conda environment

# Method 2: create Conda Environment 
conda create --name SLSVD python=3.9 -y
conda activate SLSVD

Step 3: Install the Package Using Poetry

Ensure the Conda environment is activated (you should see (SLSVD) in the terminal prompt)

poetry install  # Install the package using Poetry

Step 4: Get the coverage

# Check line coverage
pytest --cov=SLSVD

# Check branch coverage
pytest --cov-branch --cov=SLSVD
poetry run pytest --cov-branch --cov=src
poetry run pytest --cov-branch --cov=SLSVD --cov-report html

Troubleshooting

  1. Environment Creation Issues: Ensure environment.yml is in the correct directory and you have the correct Conda version

  2. Poetry Installation Issues: Verify Poetry is correctly installed in the Conda environment and your pyproject.toml file is properly configured

Usage

Use this package to find the optimized score and loading matrices of sparse logistic Singular Value Decomposition. In the following example, we generate a simulated data set with defined size first. By the Majorization-Minimization and Coordinate Descent algorithms, we obtain the optimized score and loading matrices. Finally, we visualize both the simulated data and fitted loadings in one figure.

Example usage:

>>> from slsvd.data_generation import generate_data
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> bin_mat, loadings, scores, diagonal=generate_data(n=200, d=100, rank=2, random_seed=123)

# Check shapes
>>> print("Binary Matrix Shape:", bin_mat.shape)
>>> print("Loadings Shape:", loadings.shape)
>>> print("Scores Shape:", scores.shape)

# Calculate dot product of scores
>>> scores_dot_product = np.dot(scores.T, scores)
>>> print("Dot Product of Scores:\n", scores_dot_product)

# Calculate dot product of loadings
>>> loadings_dot_product = np.dot(loadings.T, loadings)
>>> print("Dot Product of Loadings:\n", loadings_dot_product)
Binary Matrix Shape: (200, 100)

Loadings Shape: (100, 2)

Scores Shape: (200, 2)

Dot Product of Scores:
array([[195.4146256 ,   2.67535881],
       [  2.67535881, 200.14653178]])

Dot Product of Loadings:
array([[1., 0.],
       [0., 1.]])
>>> plt.figure(figsize=(8, 12))
>>> cmap = plt.cm.get_cmap('viridis', 2)

>>> plt.imshow(bin_mat, cmap=cmap, interpolation='nearest')

>>> cbar = plt.colorbar(ticks=[0.25, 0.75])
>>> cbar.ax.set_yticklabels(['0', '1'])

>>> plt.title('Heatmap of Binary Matrix')
>>> plt.xlabel('Feature')
>>> plt.ylabel('Sample')

>>> plt.show()
>>> from slsvd.slsvd import sparse_logistic_svd_coord
>>> import numpy as np

>>> # Perform Sparse Logistic SVD
>>> mu, A, B, zeros, BICs = sparse_logistic_svd_coord(bin_mat, lambdas=np.logspace(-2, 1, num=10), k=2)

>>> # Calculate mean of mu
>>> print("Mean of mu:", np.mean(mu))

>>> # Calculate dot product of Scores
>>> print("Dot Product of Scores:\n", np.dot(A.T, A))

>>> # Calculate dot product of Loadings
>>> print("Dot Product of Loadings:\n", np.dot(B.T, B))
Mean of mu: 0.052624279581212116

Dot Product of Scores:
array([[7672.61634966,  277.23466856],
       [ 277.23466856, 3986.24113586]])

Dot Product of Loadings:
array([[1.        , 0.00111067],
       [0.00111067, 1.        ]])

Documentations

Online documentation is available readthedocs.

Publishing on TestPyPi and PyPi.

Contributors

Andy Zhang

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

SLSVD was created by Andy Zhang. It is licensed under the terms of the MIT license.

Credits

SLSVD was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slsvd-0.1.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

slsvd-0.1.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file slsvd-0.1.1.tar.gz.

File metadata

  • Download URL: slsvd-0.1.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/21.6.0

File hashes

Hashes for slsvd-0.1.1.tar.gz
Algorithm Hash digest
SHA256 615a9ff23c798544065fa86172c477a16b09ea461d32e3ac73062bc73808c061
MD5 73b05a4fc2a16b45ece412001d5d59ce
BLAKE2b-256 63b39f8771a43a1e75b571953fd8961d0c4a02104f000175ed7e471d714bf62d

See more details on using hashes here.

File details

Details for the file slsvd-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: slsvd-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/21.6.0

File hashes

Hashes for slsvd-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90cb3e42489051bfaee13c6429f92cbf1c69fef63cf139decf99bb1bda245b1e
MD5 767d1e65edea9fe1c97a65bc93260b51
BLAKE2b-256 b9d80f0b01ef154c97feee4e522ec3ab30955670207204c4247e16b6fc4ea78d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page