A Python wrapper for IVSparse
Project description
PyVSparse
Python wrapper for IVSparse
Link to paper on ArXiv
Documentation
Link to GitHub Pages: Docs
IVSparse is a library for Index-and-Value Compressed Sparse Column (IVCSC) and Value Compressed Sparse Column (VCSC).
Each are methods to losslessly compress redundant sparse matrices while keeping them usable (iterable).
For a vector:
[1]
[2]
[1]
[3]
[1]
VCSC
VCSC will be formatted as:
Value = [1 2 3]
Index = [0 2 4 1 3]
Count = [3 1 1]
Where the indices of the 1
are:
- the first 3 values in
Index
(0, 2, and 4) - first values of
Value
(1) - and
count
(3).
This process is repeated for each vector, so a 1
in another vector will have its own "run".
For the 2
and 3
, there is only 1 index of each, and they are listed in order with the 2
being at index 4
and 3
at index 3
.
VCSC is typically faster for smaller matrices than IVCSC. This compression format will do no worse than COO, but can be worse than CSR/CSC if the data is not redundant.
IVCSC
For the same vector:
[1]
[2]
[1]
[3]
[1]
IVCSC will format it as
[1] [1] [0 2 2] [0] [2] [1] [4] [0] [3] [1] [3] [0]
V W I D V W I D V W I D
Key:
- V - Value
- W - Width
- I - indices
- D - Delimiter
IVCSC uses bytepacking and positive delta encoding to achieve further compression, typically, at the cost of some performance. For the first set of indices, we see the string [0 2 2]. Each index is a delta, so the sum of the previous ones is the index's value, 0 + 2 + 2 = 4, so the index is 4. W
is the byte width of the indices. A width of 1 means each index only takes 1 byte to store, and this is dynamically set by the compression algorithm. The width is the smallest number of bytes to store the largest index delta, so an the indices [1 1,000,000] will be stored in 3 byte deltas. IVCSC works best if each unique value is close together in a vector.
Dependancies
- numpy
- scipy
- matplotlib
- python 3.9 or higher
Install by:
pip install PyVSparse
Which should also downlaod the dependencies. Or use this repo by:
git clone https://github.com/Seth-Wolfgang/PyVSparse.git
cd PyVSparse
git submodule update --init --recursive
cd ..
pip install ./PyVSparse
However, building will require pybind11 and scikit-build-core
Sample Program
from PyVSparse.vcsc import VCSC
from PyVSparse.ivcsc import IVCSC
import scipy as sp
import numpy as np
# Also works for CSR!
CSC_Mat = sp.sparse.random(5, 5, format='csc', dtype = np.int8, density=1)
# Only SpMM or SpMV works for now
Dense_Vec = np.ones((5, 1), dtype = np.int8)
# Convert from CSC
VCSC_Mat = VCSC(CSC_Mat) # Will soon support VCSC_Mat = VCSC(CSC_Mat, indexType = np.int8)
IVCSC_Mat = IVCSC(CSC_Mat, major = "row") # the storage order can be set to "col" or "row"
# SpMV (will return np.ndarray)
IVCSC_Result = VCSC_Mat * Dense_Vec
VCSC_Result = IVCSC_Mat * Dense_Vec
CSC_Result = CSC_Mat * Dense_Vec
# Output
print("CSC: \n", CSC_Result)
print("VCSC: \n", VCSC_Result)
print("IVCSC: \n", IVCSC_Result)
Returns
CSC:
[[-38]
[-99]
[ 14]
[ 22]
[ 81]]
VCSC:
[[-38]
[-99]
[ 14]
[ 22]
[ 81]]
IVCSC:
[[-38]
[-99]
[ 14]
[ 22]]
Todo
- parallelize compilation
- Compatability for windows and mac
To cite IVSparse
@misc{ivsparse,
doi = {10.48550/ARXIV.2309.04355},
url = {https://arxiv.org/abs/2309.04355},
author = {Ruiter, Skyler and Wolfgang, Seth and Tunnell, Marc and Triche, Timothy and Carrier, Erin and DeBruine, Zachary},
keywords = {Data Structures and Algorithms (cs.DS), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data},
publisher = {arXiv},
year = {2023},
copyright = {Creative Commons Attribution 4.0 International}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyvsparse-0.4.4.tar.gz
.
File metadata
- Download URL: pyvsparse-0.4.4.tar.gz
- Upload date:
- Size: 8.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 145ef6e7ab487dd153a5f71011b7bd9da1f8d37647bd7235de568bad83481aff |
|
MD5 | 993850c790ead2dd5390484543da18b6 |
|
BLAKE2b-256 | 903e31aeb162356db94806681a447d14d0e4592634ecefa10da949879ffb9cef |
File details
Details for the file pyvsparse-0.4.4-cp39-cp39-manylinux_2_35_x86_64.whl
.
File metadata
- Download URL: pyvsparse-0.4.4-cp39-cp39-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.9, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 932cbf579ed90047d495e3c0ed02d7b6090ad6945a142bc0926f0914bc22a36f |
|
MD5 | 5bbb3c46209fd1892786b722f86f92ae |
|
BLAKE2b-256 | 41fae16187a9ed54de456f25d08249a643a0c870aace9da935061d725086f705 |