Sparse Tools for Analysis
Project description
spartans - SPARse Tools for ANalysiS
When working with sparse matrices, it is desired to have a way to work with them as if they were a regular numpy.arrays. Yet, many popular methods for arrays don’t exist for sparse matrices. spartans wishes to help, with many operations to work with
Full example notebook
Free software: GNU General Public License v3
Documentation: https://spartans.readthedocs.io.
Features
- Mathematical Operations
Rich set of operations not supported on sparse matrices like variance, cov (covariance matrix) and corrcoef (correlation matrix).
- Easy Indexing
Convenient methods to index for “extra” sparse features by variance or by quantity.
- Masking
Many algorithms consider the zeros in a sparse matrix as missing data. Or considering missing data as zeros. Depending on the use-case. spartans
- FeatureMatrix
FeatureMatrix is a spartan's first-class citizen. It is a wrapper around scipy.sparse.csr Matrix built with data analysis and data-science in mind.
Examples
Full example notebook
>>> import spartans as st
>>> from scipy.sparse import csr_matrix
>>> import numpy as np
>>> m = np.array([[1, -2, 0, 50],
[0, 0, 0, 100],
[1, 0, 0, 80],
[1, 4, 0, 0],f
[0, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 0, -50]])
>>> c = csr_matrix(m)
We can get the the correlation matrix of m using numpy.
>>> np.corrcoef(m, rowvar=False)
Out[]: array([[ 1. , -0.08, nan, 0.31],
[-0.08, 1. , nan, -0.35],
[ nan, nan, nan, nan],
[ 0.31, -0.35, nan, 1. ]])
This won’t work with the sparse matrix c
>>> np.corrcoef(c, rowvar=False)
AttributeError: 'float' object has no attribute 'shape'
But with spartans this can be done.
>>> st.corr(c)
Out[]: array([[ 1. , -0.08, nan, 0.31],
[-0.08, 1. , nan, -0.35],
[ nan, nan, nan, nan],
[ 0.31, -0.35, nan, 1. ]])
The column and row with nan is because the original matrix has a columns (feature) which is zero for the entire column. spartans can handle that using st.non_zero_index(c, axis=0, as_bool=False) which will return array([0, 1, 3]). A lot more functionality is in the notebook.
Credits
This open-source project is backed by SentinelOne
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2020-02-20)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spartans-0.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbd16b167d0b5d51cdf4d9ee569b631677cbcb4a47d50dad7692a3594e939810 |
|
MD5 | 9136fc6c096fdfe17acebeb142f00409 |
|
BLAKE2b-256 | f755f1a641e37c872e74a3b244500670c74b6ff530c92ab12b9de104569b6942 |