Skip to main content

Matrix completion with column subset selection.

Project description

CSMC

CSMC is a Python library for performing column subset selection in matrix completion tasks. It provides an implementation of the CSSMC method, which aims to complete missing entries in a matrix using a subset of columns.

Columns Selected Matrix Completion (CSMC) is a two-stage approach for low-rank matrix recovery. In the first stage, CSMC samples columns of the input matrix and recovers a smaller column submatrix. In the second stage, it solves a least squares problem to reconstruct the whole matrix.

Alt text

CSMC supports numpy arrays and pytorch tensors.

Installation

You can install CSMC using pip:

pip install -i  csmc

Usage

  1. Generate random data
import numpy as np
import random 

n_rows = 50
n_cols = 250
rank = 3

x = np.random.default_rng().normal(size=(n_rows, rank)) 
y = np.random.default_rng().normal(size=(rank, n_cols)) 
M = np.dot(x, y)

M_incomplete = np.copy(M)
num_missing_elements = int(0.7 * M.size)
indices_to_zero = random.sample(range(M.size), k=num_missing_elements)
rows, cols = np.unravel_index(indices_to_zero, M.shape)
M_incomplete[rows, cols] = np.nan
  1. Fill with CSNN algorithm
from csmc import CSMC
solver = CSMC(M_incomplete, col_number=100)
M_filled = solver.fit_transform(M_incomplete)
  1. Fill with Nuclear Norm Minimization with SDP (NN algorithm)
from csmc import NuclearNormMin
solver = NuclearNormMin(M_incomplete)
M_filled = solver.fit_transform(M_incomplete, np.isnan(M_incomplete))
  1. Fill with Frank-Wolfe (Conditional Gradient Method)
from csmc import CGM
solver = CGM(M_incomplete)
M_filled = solver.fit_transform(M_incomplete, np.isnan(M_incomplete))

Algorithms

Examples

Configuration

To adjust the number of threads used for intraop parallelism on CPU, modify variable:

NUM_THREADS = 8

in settings.py

Citation

Krajewska, A., Niewiadomska-Szynkiewicz E. (2024). Randomized Approach to Matrix Completion: Applications in Recommendation Systems and Image Inpainting.

Krajewska, A. , Niewiadomska-Szynkiewicz E., "Efficient Data Completion and Augmentation." 2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2024.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csmc-2.0.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csmc-2.0.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file csmc-2.0.0.tar.gz.

File metadata

  • Download URL: csmc-2.0.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.12 Linux/6.8.0-85-generic

File hashes

Hashes for csmc-2.0.0.tar.gz
Algorithm Hash digest
SHA256 33e694034d416beeeff2da6683dd09b85e770ce3346490c509aceff63520062c
MD5 e0a2559324b445a72f64af00dad80996
BLAKE2b-256 431eb1906e69ddc23b1b0214c48bf5c742352995452ee4d8cc9de42b345cfd01

See more details on using hashes here.

File details

Details for the file csmc-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: csmc-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.12 Linux/6.8.0-85-generic

File hashes

Hashes for csmc-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f050d4c65606610e1a8f70667b540acc24bbf61edb7195cf527fa39b58e74bb4
MD5 04a11f607b8fd220def319eae87b074b
BLAKE2b-256 bd1399cbb78a2c1b8ab1b1f646ee3e0cd7d5932e990cd002944893bff2f5b0e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page