Converter matrix and type determination for a range of array formats, focusing on sparse arrays
Project description
sparseconverter
Format detection, identifiers and converter matrix for a range of numerical array formats (backends) in Python, focusing on sparse arrays.
Usage
Basic usage:
import numpy as np
import sparseconverter as spc
a1 = np.array([
(1, 0, 3),
(0, 0, 6)
])
# array conversion
a2 = spc.for_backend(a1, spc.SPARSE_GCXS)
# format determination
print("a1 is", spc.get_backend(a1), "and a2 is", spc.get_backend(a2))
a1 is numpy and a2 is sparse.GCXS
See examples/
directory for more!
Description
This library can help to implement algorithms that support a wide range of array formats as input, output or for internal calculations. All dense and sparse array libraries already do support format detection, creation and export from and to various formats, but with different APIs, different sets of formats and different sets of supported features -- dtypes, shapes, device classes etc.
This project creates an unified API for all conversions between the supported formats and takes care of details such as reshaping, dtype conversion, and using an efficient intermediate format for multi-step conversions.
Features
- Supports Python 3.8 - (at least) 3.12
- Defines constants for format identifiers
- Various sets to group formats into categories:
- Dense vs sparse
- CPU vs CuPy-based
- nD vs 2D backends
- Efficiently detect format of arrays, including support for subclasses
- Get converter function for a pair of formats
- Convert to a target format
- Find most efficient conversion pair for a range of possible inputs and/or outputs
That way it can help to implement format-specific optimized versions of an algorithm, to specify which formats are supported by a specific routine, to adapt to availability of CuPy on a target machine, and to perform efficient conversion to supported formats as needed.
Supported array formats
numpy.ndarray
numpy.matrix
-- to support result of aggregation operations on scipy.sparse matricescupy.ndarray
sparse.COO
sparse.GCXS
sparse.DOK
scipy.sparse.coo_matrix
scipy.sparse.csr_matrix
scipy.sparse.csc_matrix
scipy.sparse.coo_array
scipy.sparse.csr_array
scipy.sparse.csc_array
cupyx.scipy.sparse.coo_matrix
cupyx.scipy.sparse.csr_matrix
cupyx.scipy.sparse.csc_matrix
Still TODO
- PyTorch arrays
- More detailed cost metric based on more real-world use cases and parameters.
Changelog
0.5.0 (in development)
- No changes yet
0.4.0
- Better error message in case of unknown array type: https://github.com/LiberTEM/sparseconverter/pull/37
- Support for SciPy sparse arrays: https://github.com/LiberTEM/sparseconverter/pull/52
- Drop support for Python 3.7: https://github.com/LiberTEM/sparseconverter/pull/51
0.3.4
- Support for Python 3.12 https://github.com/LiberTEM/sparseconverter/pull/26
- Packaging update: Tests for conda-forge https://github.com/LiberTEM/sparseconverter/pull/27
0.3.3
- Perform feature checks lazily https://github.com/LiberTEM/sparseconverter/issues/15
0.3.2
- Detection and workaround for https://github.com/pydata/sparse/issues/602.
- Detection and workaround for https://github.com/cupy/cupy/issues/7713.
- Test with duplicates and scrambled indices.
- Test correctness of basic array operations.
0.3.1
- Include version constraint for
sparse
.
0.3.0
- Introduce
conversion_cost()
to obtain a value roughly proportional to the conversion cost between two backends.
0.2.0
- Introduce
result_type()
to find the smallest NumPy dtype that accomodates all parameters. Allowed as parameters are all valid arguments tonumpy.result_type(...)
plus backend specifiers. - Support
cupyx.scipy.sparse.csr_matrix
withdtype=bool
.
0.1.1
Initial release
Known issues
conda install -c conda-forge cupy
on Python 3.7 and Windows 11 may installcudatoolkit
10.1 andcupy
8.3, which have sporadically produced invalid data structures forcupyx.sparse.csc_matrix
for unknown reasons. This doesn't happen with current versions. Running the benchmark functionbenchmark_conversions()
can help to debug such issues since it performs all pairwise conversions and checks for correctness.
Notes
This project is developed primarily for sparse data support in LiberTEM. For that reason it includes
the backend CUDA
, which indicates a NumPy array, but targeting execution on a CUDA device.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sparseconverter-0.4.0.tar.gz
.
File metadata
- Download URL: sparseconverter-0.4.0.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60cc87d8b18fe740101a8320226a25f4a25b1513659e887f1a4699aba2a8bcee |
|
MD5 | df67824382e61f8558e8dab7ff104b92 |
|
BLAKE2b-256 | 375c3a6f0aec3a2712ed3e687a5e39576a0e545a788a2fbe065042d945fdad8c |
File details
Details for the file sparseconverter-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: sparseconverter-0.4.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10e45d07bd50af88d5041af56eb2882ab2eb47b602b7d6e119490eaedbf24aac |
|
MD5 | 7f9ba1d2718a3f338f932c24cb6eca7a |
|
BLAKE2b-256 | 12790399c906162d90ef5dc5f3ff43674a82351f794d9c54f60740e40170d5df |