Skip to main content

A Python wrapper for IVSparse

Project description

PyVSparse

Python wrapper for IVSparse
Link to paper on ArXiv

Documentation

Link to GitHub Pages: Docs

IVSparse is a library for Index-and-Value Compressed Sparse Column (IVCSC) and Value Compressed Sparse Column (VCSC).
Each are methods to losslessly compress redundant sparse matrices while keeping them usable (iterable).

For a vector:

[1]
[2]
[1]
[3]
[1]

VCSC

VCSC will be formatted as:

Value = [1 2 3]
Index = [0 2 4 1 3]
Count = [3 1 1]

Where the indices of the 1 are:

  • the first 3 values in Index (0, 2, and 4)
  • first values of Value (1)
  • and count (3).

This process is repeated for each vector, so a 1 in another vector will have its own "run".

For the 2 and 3, there is only 1 index of each, and they are listed in order with the 2 being at index 4 and 3 at index 3.

VCSC is typically faster for smaller matrices than IVCSC. This compression format will do no worse than COO, but can be worse than CSR/CSC if the data is not redundant.

IVCSC

For the same vector:

[1]
[2]
[1]
[3]
[1]

IVCSC will format it as

[1] [1] [0 2 2] [0] [2] [1] [4] [0] [3] [1] [3] [0]
 V   W   I       D   V   W   I   D   V   W   I   D

Key:

  • V - Value
  • W - Width
  • I - indices
  • D - Delimiter

IVCSC uses bytepacking and positive delta encoding to achieve further compression, typically, at the cost of some performance. For the first set of indices, we see the string [0 2 2]. Each index is a delta, so the sum of the previous ones is the index's value, 0 + 2 + 2 = 4, so the index is 4. W is the byte width of the indices. A width of 1 means each index only takes 1 byte to store, and this is dynamically set by the compression algorithm. The width is the smallest number of bytes to store the largest index delta, so an the indices [1 1,000,000] will be stored in 3 byte deltas. IVCSC works best if each unique value is close together in a vector.

Dependancies

  • numpy
  • scipy
  • matplotlib
  • python 3.9 or higher

Install by:

pip install PyVSparse

Which should also downlaod the dependencies. Or use this repo by:

git clone https://github.com/Seth-Wolfgang/PyVSparse.git
cd PyVSparse
git submodule update --init --recursive
cd ..
pip install ./PyVSparse

However, building will require pybind11 and scikit-build-core

Sample Program

from PyVSparse.vcsc import VCSC
from PyVSparse.ivcsc import IVCSC
import scipy as sp
import numpy as np

# Also works for CSR!
CSC_Mat = sp.sparse.random(5, 5, format='csc', dtype = np.int8, density=1)

# Only SpMM or SpMV works for now
Dense_Vec = np.ones((5, 1), dtype = np.int8)

# Convert from CSC
VCSC_Mat = VCSC(CSC_Mat) # Will soon support VCSC_Mat = VCSC(CSC_Mat, indexType = np.int8)
IVCSC_Mat = IVCSC(CSC_Mat, major = "row") # the storage order can be set to "col" or "row"

# SpMV (will return np.ndarray)
IVCSC_Result = VCSC_Mat * Dense_Vec
VCSC_Result = IVCSC_Mat * Dense_Vec
CSC_Result = CSC_Mat * Dense_Vec

# Output
print("CSC: \n", CSC_Result)
print("VCSC: \n", VCSC_Result)
print("IVCSC: \n", IVCSC_Result) 

Returns

CSC: 
 [[-38]
 [-99]
 [ 14]
 [ 22]
 [ 81]]
VCSC: 
 [[-38]
 [-99]
 [ 14]
 [ 22]
 [ 81]]
IVCSC: 
 [[-38]
 [-99]
 [ 14]
 [ 22]]

Todo

  1. parallelize compilation
  2. Compatability for windows and mac

To cite IVSparse

@misc{ivsparse,
  doi = {10.48550/ARXIV.2309.04355},
  url = {https://arxiv.org/abs/2309.04355},
  author = {Ruiter,  Skyler and Wolfgang,  Seth and Tunnell,  Marc and Triche,  Timothy and Carrier,  Erin and DeBruine,  Zachary},
  keywords = {Data Structures and Algorithms (cs.DS),  Machine Learning (cs.LG),  FOS: Computer and information sciences,  FOS: Computer and information sciences},
  title = {Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution 4.0 International}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvsparse-0.4.4.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

pyvsparse-0.4.4-cp39-cp39-manylinux_2_35_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.35+ x86-64

File details

Details for the file pyvsparse-0.4.4.tar.gz.

File metadata

  • Download URL: pyvsparse-0.4.4.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for pyvsparse-0.4.4.tar.gz
Algorithm Hash digest
SHA256 145ef6e7ab487dd153a5f71011b7bd9da1f8d37647bd7235de568bad83481aff
MD5 993850c790ead2dd5390484543da18b6
BLAKE2b-256 903e31aeb162356db94806681a447d14d0e4592634ecefa10da949879ffb9cef

See more details on using hashes here.

File details

Details for the file pyvsparse-0.4.4-cp39-cp39-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for pyvsparse-0.4.4-cp39-cp39-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 932cbf579ed90047d495e3c0ed02d7b6090ad6945a142bc0926f0914bc22a36f
MD5 5bbb3c46209fd1892786b722f86f92ae
BLAKE2b-256 41fae16187a9ed54de456f25d08249a643a0c870aace9da935061d725086f705

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page