Skip to main content

A package for two-way sparse logistic SVD!

Project description

SLSVD2

Two-way Sparse Logistic Singular Value Decomposition (SLSVD) for Binary Matrix Data

License: MIT version Python 3.9.0

Project Summary

We implement the Two-way Sparse Logistic Singular Value Decomposition (SLSVD2) using the Majorization-Minimization (MM) and coordinate descent (CD) algorithms in this Python package.

Our package consists of three major components:

  1. Simulated two-way binary data generation
  2. Two-way sparse logistic SVD
  3. Metrics for evaluating estimations

Functions

There are two major functions in this package:

generate_data_2_way(n, d, rank, random_seed=123): This function generates random binary data points. It takes four parameters: n for the number of data points, d for the number of features, rank for the number of rank, and random_seed for ensuring reproducibility.

sparse_logistic_svd_coord_2_way(dat, lambdas=np.logspace(-2, 2, num=10), etas=np.logspace(-2, 2, num=10), k=2, quiet=True, max_iters=100, conv_crit=1e-5, randstart=False, normalize=False, start_A=None, start_B=None, start_mu=None): This function performs Two-way Sparse Logistic Singular Value Decomposition (SLSVD) using Majorization-Minimization and Coordinate Descent algorithms.

Common Parameters

  • n (integer): Number of data points.
  • d (integer): Number of features.
  • rank: Number of components.
  • random_seed (integer): Random seed to ensure reproducibility.
  • dat: Input data matrix.
  • lambdas: Array of regularization parameters.
  • etas: Array of regularization parameters.
  • k: Number of components.
  • quiet: Boolean to suppress iteration printouts.
  • max_iters: Maximum number of iterations.
  • conv_crit: Convergence criterion.
  • randstart: Boolean to use random initialization.
  • normalize: Boolean to normalize the components.
  • start_A: Initial value for matrix A.
  • start_B: Initial value for matrix B.
  • start_mu: Initial value for the mean vector.

Python Ecosystem Context

SLSVD2 establishes itself as a valuable enhancement to the Python ecosystem. There is no function in the Python package scikit-learn has similar functionality, our implementation uses Majorization-Minimization and Coordinate Descent algorithms.

Installation

Prerequisites

Make sure Miniconda or Anaconda is installed on your system

Step 1: Clone the Repository

git clone git@github.com:andyzhangstat/SLSVD2.git
cd SLSVD2  # Navigate to the cloned repository directory

Step 2: Create and Activate the Conda Environment

# Method 1: create Conda Environment from the environment.yml file
conda env create -f environment.yml  # Create Conda environment
conda activate SLSVD2  # Activate the Conda environment

# Method 2: create Conda Environment 
conda create --name SLSVD2 python=3.9 -y
conda activate SLSVD2

Step 3: Install the Package Using Poetry

Ensure the Conda environment is activated (you should see (SLSVD2) in the terminal prompt)

poetry install  # Install the package using Poetry

Step 4: Get the coverage

# Check line coverage
pytest --cov=SLSVD2

# Check branch coverage
pytest --cov-branch --cov=SLSVD2
poetry run pytest --cov-branch --cov=src
poetry run pytest --cov-branch --cov=SLSVD2 --cov-report html

Troubleshooting

  1. Environment Creation Issues: Ensure environment.yml is in the correct directory and you have the correct Conda version

  2. Poetry Installation Issues: Verify Poetry is correctly installed in the Conda environment and your pyproject.toml file is properly configured

Usage

Use this package to find the optimized score and loading matrices of two-way sparse logistic Singular Value Decomposition. In the following example, we generate a simulated data set with defined size first. By the Majorization-Minimization and Coordinate Descent algorithms, we obtain the optimized score and loading matrices. Finally, we visualize both the simulated data and fitted loadings in one figure.

Example usage:

>>> from slsvd.data_generation import generate_data
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> bin_mat, loadings, scores, diagonal=generate_data_2_way(n=200, d=100, rank=2, random_seed=123)

# Check shapes
>>> print("Binary Matrix Shape:", bin_mat.shape)
>>> print("Loadings Shape:", loadings.shape)
>>> print("Scores Shape:", scores.shape)

# Calculate dot product of scores
>>> scores_dot_product = np.dot(scores.T, scores)
>>> print("Dot Product of Scores:\n", scores_dot_product)

# Calculate dot product of loadings
>>> loadings_dot_product = np.dot(loadings.T, loadings)
>>> print("Dot Product of Loadings:\n", loadings_dot_product)
Binary Matrix Shape: (200, 100)

Loadings Shape: (100, 2)

Scores Shape: (200, 2)

Dot Product of Scores:
array([[1., 0.],
       [0., 1.]])

Dot Product of Loadings:
array([[1., 0.],
       [0., 1.]])
>>> plt.figure(figsize=(6, 9)) 
>>> colors = ['cyan', 'magenta']
>>> cmap = plt.matplotlib.colors.ListedColormap(colors, name='custom_cmap', N=2)
>>> plt.imshow(bin_mat, cmap=cmap, interpolation='nearest')
>>> cbar = plt.colorbar(ticks=[0.25, 0.75])
>>> cbar.ax.set_yticklabels(['0', '1'])
>>> plt.title('Heatmap of Simulated Binary Matrix')
>>> plt.xlabel('Feature')
>>> plt.ylabel('Sample')

>>> plt.tight_layout()

>>> plt.show()
>>> from slsvd.slsvd import sparse_logistic_svd_coord
>>> import numpy as np

>>> # Perform Sparse Logistic SVD
>>> mu, A, B, S, zeros, BICs = sparse_logistic_svd_coord_2_way(bin_mat, lambdas=np.logspace(-2, 1, num=10), etas=np.logspace(-2, 1, num=10), k=2)

>>> # Calculate mean of mu
>>> print("Mean of mu:", np.mean(mu))

>>> # Calculate dot product of Scores
>>> print("Dot Product of Scores:\n", np.dot(A.T, A))

>>> # Calculate dot product of Loadings
>>> print("Dot Product of Loadings:\n", np.dot(B.T, B))
Mean of mu: 0.07933574417007386

Dot Product of Scores:
array([[1.        , 0.02601576],
       [0.02601576, 1.        ]])

Dot Product of Loadings:
array([[1.        , 0.03334437],
       [0.03334437, 1.        ]])

Documentations

Online documentation is available readthedocs. Publishing on TestPyPi and PyPi.

Contributors

Andy Zhang

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

slsvd2 was created by Andy Zhang. It is licensed under the terms of the MIT license.

Credits

slsvd2 was created with cookiecutter and the py-pkgs-cookiecutter template.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slsvd2-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

slsvd2-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file slsvd2-0.1.0.tar.gz.

File metadata

  • Download URL: slsvd2-0.1.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Darwin/21.6.0

File hashes

Hashes for slsvd2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 39c8036252fd4f753b8c69fd9c09287e46f56306e5526f700119b90de3412dd8
MD5 38b1538286baf1bc63a00432d083df86
BLAKE2b-256 b8187d4c89b74f8f55413d2eba31cb31d71a842bbcd163cb307d37fa98c44fe2

See more details on using hashes here.

File details

Details for the file slsvd2-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: slsvd2-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Darwin/21.6.0

File hashes

Hashes for slsvd2-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e20dbb1e1562447720a32334057813b41c7ae60c8fdba64c261685cfd11c5354
MD5 09a6aa12aa105c5a0b9a001ca9e94572
BLAKE2b-256 954f1dbbf34fdefa4e8ba9ec144a63db5c8ce79bcd48095e1f844e9657f41d12

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page